Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones
This document summarizes an analysis that used data from smartphone sensors to predict human activities. Several classification techniques were tested on a dataset containing sensor measurements for activities like walking, sitting, and climbing stairs. The support vector machine (SVM) algorithm achieved the highest accuracy of 91.52% at predicting activities. Some activities like sitting and standing were more difficult to distinguish compared to laying down. Increasing the amount of training data could potentially improve prediction accuracy further.
A multilabel classification approach for complex human activities using a com...IJECEIAES
In our daily lives, humans perform different Activities of Daily Living (ADL), such as cooking, and studying. According to the nature of humans, they perform these activities in a sequential/simple or an overlapping/complex scenario. Many research attempts addressed simple activity recognition, but complex activity recognition is still a challenging issue. Recognition of complex activities is a multilabel classification problem, such that a test instance is assigned to a multiple overlapping activities. Existing data-driven techniques for complex activity recognition can recognize a maximum number of two overlapping activities and require a training dataset of complex (i.e. multilabel) activities. In this paper, we propose a multilabel classification approach for complex activity recognition using a combination of Emerging Patterns and Fuzzy Sets. In our approach, we require a training dataset of only simple (i.e. single-label) activities. First, we use a pattern mining technique to extract discriminative features called Strong Jumping Emerging Patterns (SJEPs) that exclusively represent each activity. Then, our scoring function takes SJEPs and fuzzy membership values of incoming sensor data and outputs the activity label(s). We validate our approach using two different dataset. Experimental results demonstrate the efficiency and superiority of our approach against other approaches.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemIOSRjournaljce
Big data problem in intrusion detection system is mainly due to the large volume of the data. The dimension of the original data is 41. Some of the feature of original data are unnecessary. In this process, the volume of data has expanded into hundreds and thousands of gigabytes(GB) of information. The dimension span of data and volume can be reduced and the system is enhanced by using K-NN and BA. The reduction ratio of KDD datasets and processing speed is very slow so the data has been reduced for extracting features by Bees Algorithm (AB) and use K-nearest neighbors as classification (KNN). So, the KDD99 datasets applied in the experiments with significant features. The results have gave higher detection and accuracy rate as well as reduced false positive rate. Keywords: Big Data; Intru
Towards Automatic Composition of Multicomponent Predictive SystemsManuel Martín
Automatic composition and parametrisation of multicomponent predictive systems (MCPSs) consisting of chains of data transformation steps is a challenging task. In this paper we propose and describe an extension to the Auto-WEKA software which now allows to compose and optimise such flexible MCPSs by using a sequence of WEKA methods. In the experimental analysis we focus on examining the impact of significantly extending the search space by incorporating additional hyperparameters of the models, on the quality of the found solutions. In a range of extensive experiments three different optimisation strategies are used to automatically compose MCPSs on 21 publicly available datasets. A comparison with previous work indicates that extending the search space improves the classification accuracy in the majority of the cases. The diversity of the found MCPSs are also an indication that fully and automatically exploiting different combinations of data cleaning and preprocessing techniques is possible and highly beneficial for different predictive models. This can have a big impact on high quality predictive models development, maintenance and scalability aspects needed in modern application and deployment scenarios.
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...IJECEIAES
This paper presents architecture of backpropagation Artificial Neural Network (ANN) and Support Vector Regression (SVR) models in supervised learning process for cement demand dataset. This study aims to identify the effectiveness of each parameter of mean square error (MSE) indicators for time series dataset. The study varies different random sample in each demand parameter in the network of ANN and support vector function as well. The variations of percent datasets from activation function, learning rate of sigmoid and purelin, hidden layer, neurons, and training function should be applied for ANN. Furthermore, SVR is varied in kernel function, lost function and insensitivity to obtain the best result from its simulation. The best results of this study for ANN activation function is Sigmoid. The amount of data input is 100% or 96 of data, 150 learning rates, one hidden layer, trinlm training function, 15 neurons and 3 total layers. The best results for SVR are six variables that run in optimal condition, kernel function is linear, loss function is ౬ -insensitive, and insensitivity was 1. The better results for both methods are six variables. The contribution of this study is to obtain the optimal parameters for specific variables of ANN and SVR.
A multilabel classification approach for complex human activities using a com...IJECEIAES
In our daily lives, humans perform different Activities of Daily Living (ADL), such as cooking, and studying. According to the nature of humans, they perform these activities in a sequential/simple or an overlapping/complex scenario. Many research attempts addressed simple activity recognition, but complex activity recognition is still a challenging issue. Recognition of complex activities is a multilabel classification problem, such that a test instance is assigned to a multiple overlapping activities. Existing data-driven techniques for complex activity recognition can recognize a maximum number of two overlapping activities and require a training dataset of complex (i.e. multilabel) activities. In this paper, we propose a multilabel classification approach for complex activity recognition using a combination of Emerging Patterns and Fuzzy Sets. In our approach, we require a training dataset of only simple (i.e. single-label) activities. First, we use a pattern mining technique to extract discriminative features called Strong Jumping Emerging Patterns (SJEPs) that exclusively represent each activity. Then, our scoring function takes SJEPs and fuzzy membership values of incoming sensor data and outputs the activity label(s). We validate our approach using two different dataset. Experimental results demonstrate the efficiency and superiority of our approach against other approaches.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemIOSRjournaljce
Big data problem in intrusion detection system is mainly due to the large volume of the data. The dimension of the original data is 41. Some of the feature of original data are unnecessary. In this process, the volume of data has expanded into hundreds and thousands of gigabytes(GB) of information. The dimension span of data and volume can be reduced and the system is enhanced by using K-NN and BA. The reduction ratio of KDD datasets and processing speed is very slow so the data has been reduced for extracting features by Bees Algorithm (AB) and use K-nearest neighbors as classification (KNN). So, the KDD99 datasets applied in the experiments with significant features. The results have gave higher detection and accuracy rate as well as reduced false positive rate. Keywords: Big Data; Intru
Towards Automatic Composition of Multicomponent Predictive SystemsManuel Martín
Automatic composition and parametrisation of multicomponent predictive systems (MCPSs) consisting of chains of data transformation steps is a challenging task. In this paper we propose and describe an extension to the Auto-WEKA software which now allows to compose and optimise such flexible MCPSs by using a sequence of WEKA methods. In the experimental analysis we focus on examining the impact of significantly extending the search space by incorporating additional hyperparameters of the models, on the quality of the found solutions. In a range of extensive experiments three different optimisation strategies are used to automatically compose MCPSs on 21 publicly available datasets. A comparison with previous work indicates that extending the search space improves the classification accuracy in the majority of the cases. The diversity of the found MCPSs are also an indication that fully and automatically exploiting different combinations of data cleaning and preprocessing techniques is possible and highly beneficial for different predictive models. This can have a big impact on high quality predictive models development, maintenance and scalability aspects needed in modern application and deployment scenarios.
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...IJECEIAES
This paper presents architecture of backpropagation Artificial Neural Network (ANN) and Support Vector Regression (SVR) models in supervised learning process for cement demand dataset. This study aims to identify the effectiveness of each parameter of mean square error (MSE) indicators for time series dataset. The study varies different random sample in each demand parameter in the network of ANN and support vector function as well. The variations of percent datasets from activation function, learning rate of sigmoid and purelin, hidden layer, neurons, and training function should be applied for ANN. Furthermore, SVR is varied in kernel function, lost function and insensitivity to obtain the best result from its simulation. The best results of this study for ANN activation function is Sigmoid. The amount of data input is 100% or 96 of data, 150 learning rates, one hidden layer, trinlm training function, 15 neurons and 3 total layers. The best results for SVR are six variables that run in optimal condition, kernel function is linear, loss function is ౬ -insensitive, and insensitivity was 1. The better results for both methods are six variables. The contribution of this study is to obtain the optimal parameters for specific variables of ANN and SVR.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Histogram-based multilayer reversible data hiding method for securing secret ...journalBEEI
In this modern age, data can be easily transferred within networks. This condition has brought the data vulnerable; so they need protection at all times. To minimize this threat, data hiding appears as one of the potential methods to secure data. This protection is done by embedding the secret into various types of data, such as an image. In this case, histogram shifting has been proposed; however, the amount of secret and the respective stego image are still challenging. In this research, we offer a method to improve its performance by performing some steps, for example removing the shifting process and employing multilayer embedding. Here, the embedding is done directly to the peak of the histogram which has been generated by the cover. The experimental results show that this proposed method has a better quality of stego image than existing ones. So, it can be one of possible solutions to protect sensitive data.
Improved target recognition response using collaborative brain-computer inter...Kyongsik Yun
One can achieve higher levels of perceptual and cognitive performance by leveraging the power of multiple brains through collaborative brain-computer interfaces
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...idescitation
Most energy management strategies in WSNs assume that data acquisition takes
more time than data transmission. But while testing several applications it has been shown
that the sensing activity of some sensors consumes significantly more energy than the radio.
Therefore it is very important for a sensor to decide whether an object is its desired target
or not before it actually starts sensing its data. The idea here is to altogether avoid sensing
the data of an object which is not considered our target. For this it is very necessary to
segregate the parameters which will uniquely identify a particular class of objects. At
present there are no lightweight object classification algorithms suitable for sensor networks
that perform such a filtering at the preliminary stage. Also the main existing approaches for
efficient energy management strategies in power-hungry sensors fall under two major
categories: duty cycling and adaptive sensing. In this paper, an energy saving mobile target
tracking scheme is presented, based on the ID3 algorithm [6] [12] which performs such
classification and elimination of undesired objects, to reduce the volume of data acquisitions
and transmissions by a sensor.
Survey on evolutionary computation tech techniques and its application in dif...ijitjournal
In computer science, 'evolutionary computation' is an algorithmic tool based on evolution. It implements
random variation, reproduction and selection by altering and moving data within a computer. It helps in
building, applying and studying algorithms based on the Darwinian principles of natural selection. In this
paper, studies about different evolutionary computation techniques used in some applications specifically
image processing, cloud computing and grid computing is carried out briefly. This work is an effort to help
researchers from different fields to have knowledge on the techniques of evolutionary computation
applicable in the above mentioned areas.
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
On comprehensive analysis of learning algorithms on pedestrian detection usin...UniversitasGadjahMada
Despite the surge of deep learning, deploying the deep learning-based pedestrian detection into the real system faces hurdles, mainly due to the huge resource usages. The classical feature-based detection system still becomes feasible option. There have been many efforts to improve the performance of pedestrian detection system. Among many feature set, Histogram of Oriented Gradient seems to be very effective for person detection. In this research, various machine learning algorithms are investigated for person detection. Different machine learning algorithms are evaluated to obtain the optimal accuracy and speed of the system.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
HII: Histogram Inverted Index for Fast Images Retrieval IJECEIAES
This work aims to improve the speed of search by creating an indexing structure in Content Based Images Retrieval (CBIR) system. We utilised an inverted index structure that usu ally used in text retrieval with a modification. The modified inverted index is built based on histogram data that generated using Multi Texton Histogram (MTH) and Multi Texton CoOccurrence Descriptor (MTCD) from 10,000 images of Corel dataset. When building the inverted index, we normalised value of each feature into a real number and considered pairs of feature and value that owned by a particular number of images. Based on our investigation, on MTCD histogram of 5,000 data test, we found that by co nsidering histogram variable values which owned by maximum 12% of images, the number of comparison for each query can be reduced by 67.47% in a rate, the precision is 82.2%, and the rate of access to disk is 32.83%. Furthermore, we named our approach as Histogram Inverted Index (HII).
Fuzzy Type Image Fusion Using SPIHT Image Compression TechniqueIJERA Editor
This paper presents a fuzzy type image fusion technique using Set Partitioning in Hierarchical Trees (SPIHT).
It is concluded that fusion with higher single levels provides better fusion quality. This technique can be used
for fusion of fuzzy images as well as multi model image fusion. The proposed algorithm is very simple, easy to
implement and could be used for real time applications. This is paper also provided comparatively studied
between proposed and previous existing technique and validation of the proposed algorithm as Peak Signal to
Noise Ratio (PSNR), Root Mean Square Error (RMSE).
Human activity recognition with self-attentionIJECEIAES
In this paper, a self-attention based neural network architecture to address human activity recognition is proposed. The dataset used was collected using smartphone. The contribution of this paper is using a multi-layer multi-head self-attention neural network architecture for human activity recognition and compared to two strong baseline architectures, which are convolutional neural network (CNN) and long-short term network (LSTM). The dropout rate, positional encoding and scaling factor are also been investigated to find the best model. The results show that proposed model achieves a test accuracy of 91.75%, which is a comparable result when compared to both the baseline models.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Histogram-based multilayer reversible data hiding method for securing secret ...journalBEEI
In this modern age, data can be easily transferred within networks. This condition has brought the data vulnerable; so they need protection at all times. To minimize this threat, data hiding appears as one of the potential methods to secure data. This protection is done by embedding the secret into various types of data, such as an image. In this case, histogram shifting has been proposed; however, the amount of secret and the respective stego image are still challenging. In this research, we offer a method to improve its performance by performing some steps, for example removing the shifting process and employing multilayer embedding. Here, the embedding is done directly to the peak of the histogram which has been generated by the cover. The experimental results show that this proposed method has a better quality of stego image than existing ones. So, it can be one of possible solutions to protect sensitive data.
Improved target recognition response using collaborative brain-computer inter...Kyongsik Yun
One can achieve higher levels of perceptual and cognitive performance by leveraging the power of multiple brains through collaborative brain-computer interfaces
Energy Efficient Mobile Targets Classification and Tracking in WSNs based on ...idescitation
Most energy management strategies in WSNs assume that data acquisition takes
more time than data transmission. But while testing several applications it has been shown
that the sensing activity of some sensors consumes significantly more energy than the radio.
Therefore it is very important for a sensor to decide whether an object is its desired target
or not before it actually starts sensing its data. The idea here is to altogether avoid sensing
the data of an object which is not considered our target. For this it is very necessary to
segregate the parameters which will uniquely identify a particular class of objects. At
present there are no lightweight object classification algorithms suitable for sensor networks
that perform such a filtering at the preliminary stage. Also the main existing approaches for
efficient energy management strategies in power-hungry sensors fall under two major
categories: duty cycling and adaptive sensing. In this paper, an energy saving mobile target
tracking scheme is presented, based on the ID3 algorithm [6] [12] which performs such
classification and elimination of undesired objects, to reduce the volume of data acquisitions
and transmissions by a sensor.
Survey on evolutionary computation tech techniques and its application in dif...ijitjournal
In computer science, 'evolutionary computation' is an algorithmic tool based on evolution. It implements
random variation, reproduction and selection by altering and moving data within a computer. It helps in
building, applying and studying algorithms based on the Darwinian principles of natural selection. In this
paper, studies about different evolutionary computation techniques used in some applications specifically
image processing, cloud computing and grid computing is carried out briefly. This work is an effort to help
researchers from different fields to have knowledge on the techniques of evolutionary computation
applicable in the above mentioned areas.
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
On comprehensive analysis of learning algorithms on pedestrian detection usin...UniversitasGadjahMada
Despite the surge of deep learning, deploying the deep learning-based pedestrian detection into the real system faces hurdles, mainly due to the huge resource usages. The classical feature-based detection system still becomes feasible option. There have been many efforts to improve the performance of pedestrian detection system. Among many feature set, Histogram of Oriented Gradient seems to be very effective for person detection. In this research, various machine learning algorithms are investigated for person detection. Different machine learning algorithms are evaluated to obtain the optimal accuracy and speed of the system.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
HII: Histogram Inverted Index for Fast Images Retrieval IJECEIAES
This work aims to improve the speed of search by creating an indexing structure in Content Based Images Retrieval (CBIR) system. We utilised an inverted index structure that usu ally used in text retrieval with a modification. The modified inverted index is built based on histogram data that generated using Multi Texton Histogram (MTH) and Multi Texton CoOccurrence Descriptor (MTCD) from 10,000 images of Corel dataset. When building the inverted index, we normalised value of each feature into a real number and considered pairs of feature and value that owned by a particular number of images. Based on our investigation, on MTCD histogram of 5,000 data test, we found that by co nsidering histogram variable values which owned by maximum 12% of images, the number of comparison for each query can be reduced by 67.47% in a rate, the precision is 82.2%, and the rate of access to disk is 32.83%. Furthermore, we named our approach as Histogram Inverted Index (HII).
Fuzzy Type Image Fusion Using SPIHT Image Compression TechniqueIJERA Editor
This paper presents a fuzzy type image fusion technique using Set Partitioning in Hierarchical Trees (SPIHT).
It is concluded that fusion with higher single levels provides better fusion quality. This technique can be used
for fusion of fuzzy images as well as multi model image fusion. The proposed algorithm is very simple, easy to
implement and could be used for real time applications. This is paper also provided comparatively studied
between proposed and previous existing technique and validation of the proposed algorithm as Peak Signal to
Noise Ratio (PSNR), Root Mean Square Error (RMSE).
Similar to Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones
Human activity recognition with self-attentionIJECEIAES
In this paper, a self-attention based neural network architecture to address human activity recognition is proposed. The dataset used was collected using smartphone. The contribution of this paper is using a multi-layer multi-head self-attention neural network architecture for human activity recognition and compared to two strong baseline architectures, which are convolutional neural network (CNN) and long-short term network (LSTM). The dropout rate, positional encoding and scaling factor are also been investigated to find the best model. The results show that proposed model achieves a test accuracy of 91.75%, which is a comparable result when compared to both the baseline models.
Influence of time and length size feature selections for human activity seque...ISA Interchange
In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances.
For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and
still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It
uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be
made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are
used then using image processing we extract the features of the image. Then we model this dataset with
different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is
to hold out a comparative study to spot which of those algorithm can predict diseases with the at most
accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different
machine learning algorithms. After all these comparison, valuable conclusions can be made for this project.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Improving the accuracy of fingerprinting system using multibiometric approachIJERA Editor
Biometric technology is a science that used to verify or identify the individual based on physical and/or
behavioral traits. Although biometric systems are considered more secure than other traditional methods such as
password, or key, they also have many limitations such as noisy image, or spoof attack. One of the solutions to
overcome these limitations, is by applying a multibiometric system. Multibiometric system has a significant
effect in improving the performance of both security and accuracy of the system. It also can alleviate the spoof
attacks and reduce the fail to enroll error. A multi-sample is one implementations of the multibiometric systems.
In this study, a new algorithm is suggested to provide a second chance for the genuine user who is rejected, to
compare his/her provided finger with the other samples of the same finger. Multisampling fingerprint is used to
implement this new algorithm. The algorithm is activated when the match score of the user is not equal to a
threshold but close to it, then the system provides another chance to compare the finger with another sample of
the same trait. Using multi-sample biometric system improved the performance of the system by reducing the
False Reject Rate (FRR). Applying the original matching algorithm on the presented database produced 3
genuine users, and 5 imposters for the same fingerprint. While after implementing the suggested condition, the
system performance is enhanced by producing 6 genuine users, and 2 imposters for the same fingerprint. This
work was built and executed depending on a previous Matlab code presented by Zhi Li Wu. Thresholds and
Receiver Operating Characteristic (ROC) curves computed before and after implementing the suggested
multibiometric algorithm. Both ROC curves compared. A final decision and recommendations are provided
depending on the results obtained from this project
A large number of techniques has been developed so far to tell the diversity of machine learning. Machine learning is categorized into supervised, unsupervised and reinforcement learning .Every instance in given data-set used by Machine learning algorithms is represented same set of features .On basis of label of instances it is divided into category. In this review paper our main focus is on Supervised, unsupervised learning techniques and its performance parameters.
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESAM Publications
The process of selecting a subset of relevant features from the feature space for use in model construction and used to carry out the feature selection process is called as pre processing step. The filter approach computationally fast and given accuracy results. The Professional Medical Conduct Board Actions data consist of all public actions taken against physicians, physician assistants, specialist assistants, and medical professional. The Classification and Regression Trees (CART), which described the generation of binary decision trees CART were invented independently of one another at around the same time, yet follow a similar approach for learning decision trees from training tuples. The research used GI-ANFIS is used to data mining technique on heart data sets to provide the diagnosis results.
Similar to Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones (20)
Conocer las diferencias entre los distintos algoritmos de aprendizaje automático.Utilizar una herramienta para minería de datos y comparar varios algoritmos de aprendizaje automático. Para ello vamos a trabajar con la herramienta RapidMiner.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones
1. IT
[1]@gsantosgo
Information Tecnology
Information Tecnology
Data Analysis
Title: Activity Prediction that a subject performs based in measurements
obtained from the accelerometer and gyroscope of the Smartphones
Introduction:
Recently, our lives are invaded by small mobile devices, known as smartphones. These devices are mobile mini-
computers, they have an operating system that allows it to launch applications, include a set of applications to
manage contacts andaddress book, to create, editorview differenttypes of documents, to access orbrowse the
Web, too provide us telephony or messaging services, etc. Apart from these previous features, the most of the
smartphones have currently begun to incorporate other features such as cameras, GPS and various types of
sensors.
In this analysis, we used data obtained from the accelerometer [1] and gyroscope[2] sensor signals of the
smartphones. The accelerometer and gyroscope sensors measure 3-axial linear acceleration and3-axial angular
velocity, with these two sensors can monitor device acceleration, positions, orientation, rotation and angular
motion. All these data can be stored and used to recognize a user’s activity. Here we refer to physical activities
thatahumanpersoncanperformdailysuchaswalking, walking up, jogging, sitting, laying, etc.
The aim of this analysis consisted of perform a classification’s task. We took a dataset with their attributes
(acceleration, orientation,…) and its labeled variable (in this case is activity), and later we created various
classification’s models also known classifiers. To create these classification’s models we can use various
algorithms of classification. These algorithms use all available information of a dataset to help us to classify or
predictthatactivityisperformedbyahumanperson.
To create models of classification (models of classification), we performed a first task that consisted of choose
different algorithms or techniques of classification, then for each algorithm or technique of classification we
applied what is called cross-validation [3], that is, we trained these algorithm with a set of training data that
corresponds to several observations of our available dataset. The following task was tested our classification’s
algorithm to observe the accuracy, that is, if our predictive model can classify correctly a human’s activity
according to the acquiredknowledge in the stage of training. This whole process is known as supervisedlearning
[4].
2. IT
[2]@gsantosgo
Information Tecnology
Information Tecnology
Methods:
DataCollection
For this analysis we used a dataset on the Human Activity Recognition. This dataset were downloaded from
coursera.org [5]in Data Analysis Course on March 03, 2013 using the R programming language. The data of this
dataset are previously processedto make them easierto loadinto R, since the data was obtainedfromother raw
data from the UC Irvine Machine Learning Repository [6] that has a dataset available about Human Activity
Recognition[7], builtfromthe recordingsof 30 subjectsperforming activitiesofdaily living (ADL)while carrying a
waist-mountedSmartphonewithembeddedinertial sensors.
The dataset for this analysis contains 7352 observations and 563 variables. For each observation, there is a
categorical orfactorvariable called“activity”(ourlabeledvariable orclass)thatindicatestheactivity carriedout
by a human person, there are only six possible values for this variable: laying, sitting, standing, walk,
walkdown and walkup. Too, there isanotherintegervariable knownas“subject”thatisthe identificatorof the
person that performed that activity. Andfinally, the rest of the 561 variables are numeric variables (quantitative)
that contains features about time and frequency on triaxial acceleration (mean, standard deviation, energy,
correlation, etc.)fromtheaccelerometer, triaxial angularvelocityfromthegyroscope, etc.
For more information about all these variables, you can find the features here in this compressed file [8]. This
compressedfile contains some interesting descriptive files thatshow information aboutthe variables usedin this
dataset, all featuresandlabeledvariableorclass.
ExploratoryAnalysis
Exploratory analysis was performed by examining data and plots of the observed data. Exploratory analysis was
used to (1) identify missing values, (2) verify the quality of the data, (3) check name of variables that are
syntactically correct and (4) identified possible different patterns between the different activities and so to be
abletodistinguishwhenauserperformsanactivity oranother.
Our predictive model [9]shouldbe able to recognize patterns corresponding to every activity. Figure 1 shows the
different patterns for different activities according to the analysis of acceleration X-axis. We can observe that
therearedifferentpatternsaccording tothatactivity iscarriedoutby auser.
4. IT
[4]@gsantosgo
Information Tecnology
Information Tecnology
It’s important keepin mind, if there are activities with common patterns, ourpredictive model will obviously have
more difficultto classify these activities correctly andtherefore ourmodel will have loweraccuracy, thatis, ithas
moredifficultiestodistinguishamong activities.
Statistical Modeling
To be able to classify the activity that is performed by a subject, we used various techniques or algorithms of
classification to recognize and predict our labeled variable (activity). The techniques (classifiers) employed for
thisdataanalysisarethefollowing:
DecisionTrees[10]
CART[11]
Bagging [12]
RamdomForest [13]
SVM[14]
We performed cross-validation for each of these previous techniques (classifiers). We also evaluated the
performance, theaccuracyandtheerrorrateoftheseclassifiers.
Reproducibility
All analysesperformedinthismanuscriptarereproducedintheR markdownfilesamsungPredictive.Rmd[15].
Note. Due to security concerns with the exchange of R code, we don’tsubmit code to reproduce analysis, in this
dataanalysis.
Results:
As I said, the dataset for this analysis contains a total size 7352 observations with 563 variables, these
observations correspond to a total 21 people. In Table 1, shows the number of examples per subject and type of
activity, andalsothepercentageoftotal peractivity fromourdataset.
We foundvariables that have syntactically incorrect names, thatis, the name of variables use incorrect character
such as comma(“,”), brackets (“(“),etc. , then itwasnecessary to have validvariable names andnotduplicatedin
our dataset (or data frame). We observed to detect missing values in the dataset, and there weren’t missing
values.
Ourclass orlabeledvariable was transformedfromcharactervariable to a factorvariable with 6 levels: “laying”,
“sitting”, “standing”, “walk”, “walkdown”and“walkup”.
5. IT
[5]@gsantosgo
Information Tecnology
Information Tecnology
According to assignment, for this data analysis we used a training set that include the data from subjects 1, 3, 5
and 6 and a test set that include the data from 27, 28, 29 and 30. Table 2 shows the number of samples per
activity that we used to perform the stage of training. And Table 3 indicates the number of samples per activity
thatweusedtoperformthestageoftesting.
id laying sitting standing walk walkdown walkup Total
1 50 47 53 95 49 53 347
3 62 52 61 58 49 59 341
5 52 44 56 56 47 47 302
6 57 55 57 57 48 51 325
7 52 48 53 57 47 51 308
8 54 46 54 48 38 41 281
11 57 53 47 59 46 54 316
14 51 54 60 59 45 54 323
15 72 59 53 54 42 48 328
16 70 69 78 51 47 51 366
17 71 64 78 61 46 48 368
19 83 73 73 52 39 40 360
21 90 85 89 52 45 47 408
22 72 62 63 46 36 42 321
23 72 68 68 59 54 51 372
25 73 65 74 74 58 65 409
26 76 78 74 59 50 55 392
27 74 70 80 57 44 51 376
28 80 72 79 54 46 51 382
29 69 60 65 53 48 49 344
30 70 62 59 65 62 65 383
Sum 1407 1286 1374 1226 986 1073 7352
% 19,14 17,49 18,69 16,68 13,41 14,59 100
Table 1.Number of samples per subject and type of activity
Laying sitting standing walk walkdown walkup
55 50 57 64 49 53
Table 2.Number of samples per activity for Training
laying sitting standing walk walkdown walkup
74 64 71 56 52 54
Table 3.Number of samples data per activity fo Testing
6. IT
[6]@gsantosgo
Information Tecnology
Information Tecnology
Weperformedtheprocessofcross-validationforeachofthepreviousclassifiersusing thetraining setandtest
setwerealreadyearlyspecified.
Theresultsobtainedfordifferentclassificationtechniques(predictivemodels)using theR programming language
arepresentedinTable4. Inthistablecanbetheaccuracy ofeachclassificationtechniqueperactivity. Thecells
inboldandunderlineindicatethebestaccuracy.
Itisimportanttakeintoaccountthatweusedall quantitativevariables(561variables)topredicttheactivity
carriedoutbyasubjectinthese5classificationtechniques. Recall, ifwehavealotofvariables, theperformance
oftheclassificationalgorithmmaybeextremely affected, tooalotofthesequantitativevariablescouldaddnoise
toclassifycorrectlyactivities, andotherscouldnotbeinteresting toprovidegoodinformationtodistinguish
among activities. Ontheotherhand, Itwill bevery interesting, toperformameasureofhowmuchtheclassifiers
areoverfitting[16].
Ingeneral, themostoftheclassificationtechniquesusedinthisanalysishavehighlevelsofaccuracy. Butwecan
observelessaccurateforsomeactivitiesandforsomeclassificationtechniques.
% Correctly Predicted
Model Tree
library(tree)
CART
library(rpart)
BAGGING
library(ipred)
Random Forest
library(randomForest)
SVM
library(e1071)
laying 100,00 100,00 100,00 100,00 100,00
sitting 70,31 67,19 67,19 82,81 82,81
standing 85,92 88,73 88,73 88,73 88,73
walk 50,00 57,14 80,30 92,86 92,86
walkdown 84,61 86,54 94,23 86,54 86,54
walkup 85,19 85,19 87,03 96,30 98,15
All 79,34 80,80 86,25 91,21 91,52
Table 4.Accuracies of the Classification Techniques
In the following tables (Table 5-9) show confusion matrices for each of classification
techniques.
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 45 19 0 0 0
standing 0 10 61 0 0 0
walk 0 0 0 28 6 22
walkdown 0 0 0 0 44 8
walkup 0 0 0 1 7 46
Table 5.Confusion matrix for the Decision Tree
7. IT
[7]@gsantosgo
Information Tecnology
Information Tecnology
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 43 21 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 32 4 20
walkdown 0 0 0 0 45 7
walkup 0 0 0 1 7 46
Table 6.Confusion matrix for the CART
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 43 21 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 53 0 3
walkdown 0 0 0 0 49 3
walkup 0 0 0 1 6 47
Table 7.Confusion matrix for Bagging
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 53 11 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 52 0 4
walkdown 0 0 0 0 47 5
walkup 0 0 0 0 2 52
Table 8.Confusion matrix for Random Forest
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 53 11 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 52 0 4
walkdown 0 0 0 0 47 5
walkup 0 0 0 0 1 53
Table 9.Confusion matrix for SVM
In general, we observedthatthe classification techniques identify correctly laying (100%). Itappears much more
difficulttodistinguishbetweensitting andstanding, andalsotodistinguishbetweenwalk, walkdownandwalkup.
8. IT
[8]@gsantosgo
Information Tecnology
Information Tecnology
The Bagging, Random Forest and SVM are classifiers that require more computing and memory resources, and
thereforemoreclassificationtimethanTreeandCART.
Conclusions:
In this analysis, we employed various classification techniques to obtain different predictive model. The SVM
classifier algorithm achieved the highest levels of accuracy for this analysis (91,52%accuracy). It will be
recommendable to increase the number of observations. Too, it will be recommendable to increase the samples
for the set of training data, and the samples for the set of test data, and observe if the accuracy increased or
decreased. On the other hand, there are some problems to detect patterns of some activities with each other,
because there are a lot of similar patterns among the different activity and then the classifier doesn’t classify
correctly.
References
[1]Accelerometer
http://en.wikipedia.org/wiki/Accelerometer. Accessed03/04/2013
[2]Gyroscope
http://en.wikipedia.org/wiki/Gyroscope. Accessed03/04/2013
[3]CrossValidation
http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29. Accessed03/10/2013
[4]SupervisedLearning
http://en.wikipedia.org/wiki/Supervised_learning. Accesed03/05/2013
[5]DatasetofHumanActivityRecognitionCoursera
https://spark-public.s3.amazonaws.com/dataanalysis/samsungData.rda. Accessed03/03/2013
[6]UC IrvineMachineLearning Repository
http://archive.ics.uci.edu/ml/. Accessed03/06/2013
[7]DatasetofHumanActivityRecognitionUsing SmartphonesDataSet
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones.
Accessed03/06/2013
[8]FileofHumanActivityRecognitionUCI
http://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip. Accessed
03/06/2013
[9]PredictiveModelling
http://en.wikipedia.org/wiki/Predictive_modelling. Accessed03/10/2013
[10]TreeLearning
http://en.wikipedia.org/wiki/Decision_tree_learning. Accessed03/10/2013
[11]CART
9. IT
[9]@gsantosgo
Information Tecnology
Information Tecnology
http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees. Accessed03/10/2013
[12]Bagging
http://en.wikipedia.org/wiki/Bootstrap_aggregating. Accessed03/10/2013
[13]RandomForest(RF)
http://en.wikipedia.org/wiki/Random_forest. Accessed03/10/2013
[14]SupportVectorMachine(SVM)
http://en.wikipedia.org/wiki/Support_vector_machine. Accessed03/10/2013
[15]R MarkdownPage.
http://www.rstudio.com/ide/docs/authoring/using_markdown. Accessed03/06/2013
[16]Overfitting
http://en.wikipedia.org/wiki/Overfitting. Accessed03/10/2013