Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
Over the last two decades, the internet has gained a widespread use in various aspects of everyday living. The amount of generated data in both structured and unstructured forms has increased rapidly, posing a number of challenges. Unstructured data are hard to manage, assess, and analyse in view of decision making. Extracting information from these large volumes of data is time-consuming and requires complex analysis. Information extraction (IE) technology is part of a text-mining framework for extracting useful knowledge for further analysis.
Various competitions, conferences and research projects have accelerated the development phases of IE. This project presents in detail the main aspects of the information extraction field. It focused on specific domain: airplane crash reports. Set of reports were used from 1001 Crash website to perform the extraction tasks such as: crash site, crash date and time, departure, destination, etc. As such, the common structures and textual expressions are considered in designing the extraction rules.
The evaluation framework used to examine the system's performance is executed for both working and test texts. It shows that the system's performance in extracting entities and relations is more accurate than for events. Generally, the good results reflect the high quality and good design of the extraction rules. It can be concluded that the rule-based approach has proved its efficiency of delivering reliable results. However, this approach does require an intensive work and a cycle process of rules testing and modification.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
Big Data to avoid weather related flight delaysAkshatGiri3
This topic generally belongs to weather forecasting, how we will implement Big Data computing for future weather prediction so that weather Related Flight Delays get minimized.
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...IJDKP
ABSTRACT
The objectives of this paper were to 1) develop an empirical method for selecting relevant attributes for modelling drought and 2) select the most relevant attribute for drought modelling and predictions in the Greater Horn of Africa (GHA). Twenty four attributes from different domain areas were used for this experimental analysis. Two attribute selection algorithms were used for the current study: Principal Component Analysis (PCA) and correlation-based attribute selection (CAS). Using the PCA and CAS algorithms, the 24 attributes were ranked by their merit value. Accordingly, 15 attributes were selected for modelling drought in GHA. The average merit values for the selected attributes ranged from 0.5 to 0.9. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection
challenges, which are unsystematic and dominated by somewhat arbitrary trial. Future research may evaluate the developed methodology using relevant classification techniques and quantify the actual information gain from the developed approach.
SELECTION OF MATERIAL HANDLING EQUIPMENT FOR FLEXIBLE MANUFACTURING SYSTEM US...ijmech
Material handling equipment plays an important role in the design and development of advance
manufacturing systems (AMS) like flexible manufacturing systems. Material handling equipments affects
the performance and productivity of these advance manufacturing systems. So it becomes very important to
select a right kind of equipment while designing the high end manufacturing systems like FMS. In this
paper an attempt has been made to select the most appropriate material handling equipment for the design
and development of FMS. The proposed model has been build on the basis of material handling attributes
and sub attributes which are critical for material handling equipment selection.
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUESJournal For Research
Databases are systems, which are used for storing data. With the increase in information at rapid pace, extraction of relevant information becomes time-consuming using traditional databases. Thus, the need of intelligent or smart databases arises. Intelligent database is a system in which automation is applied on conventional database in order to enhance its functionality and efficiency. Intelligent databases help us in making decisions and can respond or act to situations by learning. It has applications in wide areas such as healthcare, automobile, education, information security, business etc. This paper represents various intelligent database techniques, which are further used for implementation of applications of this domain.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
Performance analysis of binary and multiclass models using azure machine lear...IJECEIAES
Network data is expanding and that too at an alarming rate. Besides, the sophisticated attack tools used by hackers lead to capricious cyber threat landscape. Traditional models proposed in the field of network intrusion detection using machine learning algorithms emphasize more on improving attack detection rate and reducing false alarms but time efficiency is often overlooked. Therefore, in order to address this limitation, a modern solution has been presented using Machine Learning-as-a-Service platform. The proposed work analyses the performance of eight two-class and three multiclass algorithms using UNSW NB-15, a modern intrusion detection dataset. 82,332 testing samples were considered to evaluate the performance of algorithms. The proposed two class decision forest model exhibited 99.2% accuracy and took 6 seconds to learn 1,75,341 network instances. Multiclass classification task was also undertaken wherein attack types like generic, exploits, shellcode and worms were classified with a recall percentage of 99%, 94.49%, 91.79% and 90.9% respectively by the multiclass decision forest model that also leapfrogged others in terms of training and execution time.
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLijaia
Given the impact of Machine Learning (ML) on individuals and the society, understanding how harm might
be occur throughout the ML life cycle becomes critical more than ever. By offering a framework to
determine distinct potential sources of downstream harm in ML pipeline, the paper demonstrates the
importance of choices throughout distinct phases of data collection, development, and deployment that
extend far beyond just model training. Relevant mitigation techniques are also suggested for being used
instead of merely relying on generic notions of what counts as fairness.
In current decades, Land Use (LU) and Land Cover (LC) classification is the most
challenging research area in the field of remote sensing. This research helps in
understanding the environmental changes for ensuring the sustainable development.
In this research, LU and LC classification assessed for Visakhapatnam city. After
collecting the satellite images, Hybrid Directional Lifting (HDL) technique was used
to remove the saturation and blooming effects in the input images. The pre-processed
satellite images were used for segmentation by applying Fuzzy C means (FCM)
clustering. Then, Local Binary Pattern (LBP) and Gray-level co-occurrence matrix
(GLCM) features were utilized to extract the features from the segmented satellite
images. After obtaining the feature information, a multi-class classifier: Adaptive
Neuro-Fuzzy Inference System (ANFIS) was used to classify the LU and LC classes;
water-body, vegetation, settlement, and barren land. The experimental outcome
showed that the proposed system effectively distinguishes the LU and LC classes by
means of sensitivity, specificity, and classification accuracy. The proposed system
enhances the classification accuracy up to 7% compared to the existing systems
Big Data to avoid weather related flight delaysAkshatGiri3
This topic generally belongs to weather forecasting, how we will implement Big Data computing for future weather prediction so that weather Related Flight Delays get minimized.
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...IJDKP
ABSTRACT
The objectives of this paper were to 1) develop an empirical method for selecting relevant attributes for modelling drought and 2) select the most relevant attribute for drought modelling and predictions in the Greater Horn of Africa (GHA). Twenty four attributes from different domain areas were used for this experimental analysis. Two attribute selection algorithms were used for the current study: Principal Component Analysis (PCA) and correlation-based attribute selection (CAS). Using the PCA and CAS algorithms, the 24 attributes were ranked by their merit value. Accordingly, 15 attributes were selected for modelling drought in GHA. The average merit values for the selected attributes ranged from 0.5 to 0.9. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection
challenges, which are unsystematic and dominated by somewhat arbitrary trial. Future research may evaluate the developed methodology using relevant classification techniques and quantify the actual information gain from the developed approach.
SELECTION OF MATERIAL HANDLING EQUIPMENT FOR FLEXIBLE MANUFACTURING SYSTEM US...ijmech
Material handling equipment plays an important role in the design and development of advance
manufacturing systems (AMS) like flexible manufacturing systems. Material handling equipments affects
the performance and productivity of these advance manufacturing systems. So it becomes very important to
select a right kind of equipment while designing the high end manufacturing systems like FMS. In this
paper an attempt has been made to select the most appropriate material handling equipment for the design
and development of FMS. The proposed model has been build on the basis of material handling attributes
and sub attributes which are critical for material handling equipment selection.
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUESJournal For Research
Databases are systems, which are used for storing data. With the increase in information at rapid pace, extraction of relevant information becomes time-consuming using traditional databases. Thus, the need of intelligent or smart databases arises. Intelligent database is a system in which automation is applied on conventional database in order to enhance its functionality and efficiency. Intelligent databases help us in making decisions and can respond or act to situations by learning. It has applications in wide areas such as healthcare, automobile, education, information security, business etc. This paper represents various intelligent database techniques, which are further used for implementation of applications of this domain.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
Performance analysis of binary and multiclass models using azure machine lear...IJECEIAES
Network data is expanding and that too at an alarming rate. Besides, the sophisticated attack tools used by hackers lead to capricious cyber threat landscape. Traditional models proposed in the field of network intrusion detection using machine learning algorithms emphasize more on improving attack detection rate and reducing false alarms but time efficiency is often overlooked. Therefore, in order to address this limitation, a modern solution has been presented using Machine Learning-as-a-Service platform. The proposed work analyses the performance of eight two-class and three multiclass algorithms using UNSW NB-15, a modern intrusion detection dataset. 82,332 testing samples were considered to evaluate the performance of algorithms. The proposed two class decision forest model exhibited 99.2% accuracy and took 6 seconds to learn 1,75,341 network instances. Multiclass classification task was also undertaken wherein attack types like generic, exploits, shellcode and worms were classified with a recall percentage of 99%, 94.49%, 91.79% and 90.9% respectively by the multiclass decision forest model that also leapfrogged others in terms of training and execution time.
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLijaia
Given the impact of Machine Learning (ML) on individuals and the society, understanding how harm might
be occur throughout the ML life cycle becomes critical more than ever. By offering a framework to
determine distinct potential sources of downstream harm in ML pipeline, the paper demonstrates the
importance of choices throughout distinct phases of data collection, development, and deployment that
extend far beyond just model training. Relevant mitigation techniques are also suggested for being used
instead of merely relying on generic notions of what counts as fairness.
In current decades, Land Use (LU) and Land Cover (LC) classification is the most
challenging research area in the field of remote sensing. This research helps in
understanding the environmental changes for ensuring the sustainable development.
In this research, LU and LC classification assessed for Visakhapatnam city. After
collecting the satellite images, Hybrid Directional Lifting (HDL) technique was used
to remove the saturation and blooming effects in the input images. The pre-processed
satellite images were used for segmentation by applying Fuzzy C means (FCM)
clustering. Then, Local Binary Pattern (LBP) and Gray-level co-occurrence matrix
(GLCM) features were utilized to extract the features from the segmented satellite
images. After obtaining the feature information, a multi-class classifier: Adaptive
Neuro-Fuzzy Inference System (ANFIS) was used to classify the LU and LC classes;
water-body, vegetation, settlement, and barren land. The experimental outcome
showed that the proposed system effectively distinguishes the LU and LC classes by
means of sensitivity, specificity, and classification accuracy. The proposed system
enhances the classification accuracy up to 7% compared to the existing systems
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to find algorithms which have high detection rate, low training time, need less training samples and are easy to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated. Experimental results illustrate that Logistic Regression shows the best overall performance.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAY
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
DOI: 10.5121/ijdkp.2018.8301 1
MACHINE LEARNING TECHNIQUES FOR ANALYSIS
OF EGYPTIAN FLIGHT DELAY
Shahinaz M. Al-Tabbakh 1, Hanaa M. Mohamed2 and H. El-Zahed3
1
Computer Science Group, Faculty of Women for Sciences, A. and Education, Ain Shames
University, Cairo-Egypt.
2
Internet Dev.Dept. Manager of IT Sector-EGYPTAIR Holding Cooperation, Cairo,
Egypt
3
Faculty of Women for Sciences, A. and Education, Ain Shames University, Cairo-Egypt.
ABSTRACT
Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
KEYWORDS
Airlines, Flight delay, WEKA, Bigdata, Data mining, classification Algorithms , J48,Random Forest,
Decision Stump, Ripper rule, Association rules, priori, Confusion matrix.
1. INTRODUCTION
Every day, there are thousands of people who travel by airplane to get to their destinations.
Unfortunately for airline travellers, however, many of these flights do not leave on-time. Flight
delay are a major problem within the airline industry. Flight delay is a challenging problem for all
airline companies [1, 2], which will lead to Financial losses, Fuel losses and negative impact on
their business reputation. This work explores what factors influence the occurrence of flight
delays. Predicting whether or not a Flight delay will be initiated at an airport, based on
meteorological forecast and traffic demand, can alert traffic managers and airlines about potential
congestion and necessitate strategies for mitigating these disruptions to air traffic.
Data mining can be defined as the extraction of useful knowledge from large data repositories.
Data mining techniques are the result of a long process of research and product development. This
evolution began when business data was first stored on computers, continued with improvements
in data access, and more recently, generated technologies that allow users to navigate through
their data in real time. Data mining is ready for application in the business community because it
is supported by three technologies including machine learning, statistics, and database systems for
the analysis of large volumes of data. Because of Machine Learning techniques, Machine can be
trained using the training samples from existing data sets and then a training model can be
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
2
devised which can be applied to the future data to categorize it correctly. Such learning models are
usually classifiers. Decision Tree, Artificial Neural Networks, Bayesian networks, and Support
Vector Machines are examples of such classifiers. But if data is irrelevant, redundant, noisy and
unreliable data, then there is no doubt that training phase is a very challenging task. Number of
data can be misclassified if the training phase is not proper. Classification is one of the most
popular data mining tasks. Hence in this paper the authors have tried to make a comparison of
various Decision Trees approaches and ensemble Learning rules. For data-mining purposes, an
open-source software package called WEKA developed using JAVA was used[3, 4] to help us in
postulating a predictive models for flight delays based on various attributes of a particular
flights. These models will make us able to predict when delays would be encountered and
increase the knowledge for passengers advising them on the most efficient ways to travel.
In this paper, eight classification algorithms are investigated for their performance, as explained
in section 3 with some theoretical aspects. In sections 2, related works are discussed. In sections
4, the types of classification algorithms including their performance measures will be
investigated. Results and comparative analysis, and conclusions will be explained in sections 6
and 7 respectively.
2. RELATED WORKS
There are many data mining techniques proposed in the literature based on machine learning for
building and extraction of predictive models using historical data. Namely, clustering,
classification rules, association and regression. The structure of the trained predictive models can
provide insight into factors that influence the initiation of a ground delay program at a given
airport. Unsupervised data modeling techniques such as Principal Component Analysis, and
clustering have been applied to classify days based on weather impact [5] and performance
metrics [6] our contribution is discovery of the classification rules by more than one method and
comparison of these methods for traffic flight data set of Egypt Airlines. It is Unique up to now,
However there are some work done by:
Akpinar and Karabacak [7] provided a reviewed to Data Mining in Civil Aviation, so they
implemented some data mining classification rules based on certain critical factors including
airlines, airports, cargo, passenger, efficiency and safety. Airline companies may use data mining
in order to fuel cost optimization, planning take into consideration weather conditions, passenger
analysis, cargo optimization, airport situation revenue per flight, profit per flight, cost per seat or
more detailed one catering and handling expenses per seat.
Nazeri and Zhang [8] applied a data mining approach for analysis of whether impact on NAS
performance. They applied decision tree learning algorithms C5.0 and K mean clustering
algorithms. They discovered that weather pattern /rules had significant impact on NAS
performance. Discovered rules may be used to predict if a day is good or bad performance based
on its weather. They conclude rule relate between blocked flights and bypass distance.
Ha and it al [9] developed and implemented an experimental model based on the CRISP-DM
(Cross Industry Standard Process for Data Mining) methodology applied to Big Data Mining.
They concluded that the arrival delay can be proposed for optimal airports. They classify the
airports according to arrival delay.
Mukherjee and Sridhar[1] presented two supervised-learning models, logistic regression and
decision tree to predict occurrence of a Ground Delay Program (GDP), at an airport based on
traffic demand and meteorological conditions. The models are applied to predict GDP occurrence
at two major U.S. airports: San Francisco International airport (SFO) and Newark Liberty
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
3
International (EWR) airports. Historical hourly observations of meteorological conditions such as
visibility, cloud height, wind, convection, precipitation, etc., and arrival traffic demand based on
flight schedules are used to calibrate the models. The logistic regression model estimates the
probability of a GDP occurrence during the hour. The decision tree model, on the other hand,
classifies the hour as a GDP or non- GDP.
3. THEORETICAL OVERVIEW:
3.1 MACHINE LEARNING APPROACHES
Machine learning is a set of algorithms which are able to find potential patterns in data and use
these patterns to predict unlabelled data. In 1959, Arthur Samuel [10] informally defined machine
learning as a subject which tries to figure out how to make computer solve real problems
automatically without programming detailed code. For example, he developed the Samuel
Checkers-playing Program which had the ability to learn a reasonable way to play and managed
to defeat many human players
3.2 CLASSIFICATION MODELS
Classification is one of the Data Mining techniques that is mainly used to analyse a given dataset
and takes each instance of it and assigns this instance to a particular class with the aim of
achieving least classification error. It is used to extract models that correctly define important
data classes within the given dataset. It is a two-step process. In first step the model is created by
applying classification algorithm on training data set. Then in second step, the extracted model is
tested against a predefined test dataset to measure the model trained performance and accuracy.
So, classification is the process to assign class label for this dataset whose class label is unknown.
Various lists of techniques are available for classification like decision tree induction, Bayesian
classification, and Bayesian network. Here under is the description of classifications algorithms to
be investigated in this paper.
Decision Tree is a model which is used on most classifier in our work. It has a tree like structure
in which all the internal nodes (including the root node) represent an attribute and the leaf nodes
represent the classification categories as per the goal class [11]. The basic concept by which
classification is done in Decision tree is that it takes multiple linear decisions to perform a
nonlinear classification. A decision is taken at every level of the tree and finally we arrive at leaf
node which leads the instance to belong to a particular category. According to the type of data
being used, decision trees are categorized into two types:
Classification Trees (For categorical data)
Regression Trees (For Numerical data)
Entropy and information gain are the two parameters whose value determines the purity of a node
and according to that only decision tree is constructed. Most pure node lies at the bottom (as leaf
nodes). DT use as a descriptive means for calculating conditional probabilities. A Decision Tree
uses a top-down, divide and conquer as classification strategy and it partitions a set of given
entities into further smaller classes on the basis of automatically selected rules .In addition , Weka
implementation classes of classification trees are indicated by the following types . Our work is
the first work to compare various tree classifiers in WEKA. The classifiers description are as
following [3]:
3.2.1 Classifiers.Trees.Decisionstump
Class for building and using a decision stump. Usually used in conjunction with a boosting
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
4
algorithm. Does regression (based on mean-squared error) or classification (based on entropy).
Missing is treated as a separate value.
3.2.2 classifiers.trees.J48
decision tree is a predictive machine-learning model that decides the target value (dependent
variable) of a new sample based on various attribute values of the available data.this classifier is
an open source java implementation of the C4.5 algorithm in the WEKA data mining tool.J48
also accepts both continuous and categorical attributes in building the decision tree. It use tree
pruning that reduces misclassification errors and reducing the size of the decision tree. It is used
to mitigate over fitting, where perfect accuracy on training data are also achieved [11, 12].
3.2.3 classifiers.trees.RandomForest
Class for constructing a forest of random trees. Random forests are collections of trees, all
slightly different. Random Forest algorithm can be used to handle missing values. Classify new
data points by taking a (weighted) vote of their predictions that consists of many decision trees
and outputs the class that is the mode of the classes output by individual trees, overfitting will be
avoided by having many trees and so therefore more accurate also It run efficiently on large
databases.
3.2.4 classifiers.trees.REPTree
Class for fast decision tree learner. Builds a decision/regression tree using information
gain/variance and prunes it using reduced-error pruning (with back fitting). Only sorts values for
numeric attributes once. Missing values are dealt with by splitting the corresponding instances
into pieces (i.e. as in C4.5) [8].
Another Rules Classifiers in WEKA are as following:
3.2.5 classifiers.rules.PART
Class for generating a PART decision list. Uses separate-and-conquer. Builds a partial C4.5
decision tree in each iteration and makes the "best" leaf into a rule [13].
3.2.6. classifiers.rules.DecisionTable
Class for building and using a simple decision table majority classifier. A decision table consists
of a hierarchical table in which each entry in a higher level table gets broken down by the values
of a pair of additional attributes to form another table. The structure is similar to dimensional
stacking [14].
3.2.7 classifiers.rules.OneR
Class for building and using a OneR classifier; classification algorithm that generates one rule for
each predictor in the data, then selects the rule with the smallest total error as its "one rule". To
create a rule for a predictor, a frequency table is constructed for each predictor against the target.
It uses the minimum- error attribute for prediction, discrediting numeric attributes also it is simple
for humans to interpret [15].
3.2.8 classifiers.rules.JRip
It is the Weka implementation class of the algorithm Ripperk [8] that apply Ripper Rule (JRIP).
Ripper Rule (JRIP) is used to generate various rules by adding repetitive datasets until the rules
cover all data pattern according to the training dataset. In addition, once all rules are generated,
some of them will be merged in order to reduce size. In addition, this algorithm uses incremental
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
5
reduced-error pruning in order to obtain a set of classification rules; k is the number of
optimization cycles of rules sets.
3.3 ASSOCIATION RULES
Association rules are if/then statements that help uncover relationships between prominently
unrelated data in a relational database [16]. Apriori algorithm is used to perform association rule
mining. The Apriori algorithm was introduced in [AS94] as a way to generate association rules
from market basket data. The Apriori algorithm is a two stage process: A frequent itemset
(itemsets that satisfy minimum support threshold) mining stage and a rule generation stage (rules
that satisfy minimum confidence threshold). WEKA allows the resulting rules to be sorted
according to different metrics such as confidence Eq(1) and lift.Eq(2).
If we have a given rule with 2 sides L => R
Confidence is the probability that L and R occur together, if the Confidence value is less than 1; L
and R are negatively correlated, otherwise L and R positively correlated
Confidence = Pr(L,R) (1)
Lift (or improvement) is the ratio of the probability that L and R occur together to the multiple of
the two individual probabilities for L and R or it is computed as the confidence of the rule divided
by the support of the right-hand-side (RHS).
lift = Pr(L,R) / Pr(L).Pr(R) (2)
If this value is 1, then L and R are independent. The higher this value, the more likely that the
existence of L and R together in a transaction is not just a random occurrence, but because of
some relationship between them.
4. METHODOLOGY:
For achieving the goal of this research, we have put our methodology in phases as shown in
figure.1. We used machine-learning algorithms embedded in WEKA on the data obtained from
EGYPTAIR fleet-watch management
Figure. 1. Phases of extraction of flight delay model
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
6
4.1 WEKA ENVIRONMENT
WEKA (Waikato Environment for Knowledge Analysis) environment is a very popular tool
programmed in Java that deals with machine learning and data mining. It contains a collection of
visualization tools and algorithms available for data mining preprocess such as classification,
association, clustering, modeling and evaluation.
4.2 DATA PREPARATION
The dataset for this research was obtained from the EGYPTAIR fleet-watch management system
[18]. The description of data set is shown in table 1, table 2. The dataset went through a number
of transformations to prepare it for data- mining. First, we remove all flights that were missing
data. A flight may be missing data because it was diverted, cancelled, or for some other related
reason. Without complete values, these entries are not suitable to data-mining, so they were
removed. The dataset itself contains data on both departure and arrival times, but for simplicity
we decided to examine only the departures.
Table1. Data set Description
Table 2. Description of attributes for the data set
4.3 DATA CLEANING AND TRANSFORMING
Data cleaning is a very important step in data mining; here is where each dataset is cleaned. This
means the model is prepared to fit the needed format, by eliminating unwanted values such as
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
7
outliers, extreme values or values, which aren’t wanted to be processed, transformed or adapted to
the desired format. If this step isn’t done correctly, the final results may turn out to be incorrect or
may vary a lot from the correct ones. That’s why as mentioned in the first line of this paragraph, it
is a very important step, and it doesn’t take the same amount of time and effort to do it well than
to do it wrongly. Figure 2. The data is configured to be in Arff format (WEKA) file format, with
attribute values.
Figure 2. The data is configured to be in Arff format (WEKA) file format, with attribute values
The instance values of city_To and city_from is three-letter codes which are the standard codes
used for airports. Their meaning can be found on any travel-reservation website. The instance
values of Aircraft unique Registration letters is {A, D, R, S}, R: means returned, S: mean
scheduled, D: means departed, A: Arrived . Some important notices during building model in
WEKA:
• When the dataset was imported into WEKA, a series of problems appear and to solve these
problems some transformations had to be done to the data before being able to do anything
else. As fixing model and clean data.
• Every data mining software doesn’t work the same way and doesn’t have the same data
requirements. So before doing anything, some settings had to be adjusted before being be
able to deal with the model without any more problems. Some variable types had to be
changed, because WEKA automatically decides the variable type when you import a
dataset.
• Useless variables were removed using a WEKA filter. This removed the Origin and Origin
city, and that’s because it never changes, so the origin city is Cairo and the origin is CAI
and both of these can be assumed. In addition to those variables, some variables that
couldn’t be knownbeforehand
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
8
4.4 METRICS OF COMPARISONS
Classifiers in the open-source program WEKA were used to determine whether any algorithm
stood out as being particularly well suited for these data. The eight classifiers, which were tested,
are readily available and represent some of the most widely used classification methods in
aviation. However, they also represent classifiers that are diverse at the most fundamental levels.
The eight classifier is explained previously in the theoretical overview part are decision stump,
trees_J48, Random Forest, REPTree, PART, Decision Table , OneR and Ripperk. A distinguished
confusion matrix was obtained to calculate sensitivity, specificity, precision, recall, F- measure,
Roc and accuracy which are the metrics of evaluation of these classifiers. Confusion matrix is 2X2
matrix Representation of the classification results. The number of correctly classified instances is
the sum of diagonals on the matrix; all others are incorrectly classified. Table 3 shows a
representation of confusion matrix, where: TP = true positive, TN = true negative, FP = false
positive, FN = false negative.
Table 3 Confusion matrix
The below formulas were used to calculate metrics of classifications namely, true positive rate,
false positive precision, F score, Roc area and accuracy. We use the confusion matrix to see the
performance of each classifier, the classifier used in the work is more efficient when more than
70% result have.
Number of correctly good classifier should have a high True Positive Rate (TPR) .TPR is the
ratio of correctly predicted delays to the total number of actual delays; a TPR of 1 meaning that
all delays were well predicted. TPR is equivalent to Recall.
False Positive Rate (FPR) which is the ratio of on-time samples predicted as delayed, to the total
number of on-time samples; a FPR of 0 meaning that no on-time samples were predicted as
delayed.
Accuracy compares how close a new test value is to a value predicted by if ... then rules, gives an
overall evaluation. It is defined as the percentage proportion of correctly classified cases to all
cases in the set. The larger the predictive accuracy the better the situation. This is written
mathematically:
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
9
Precision is the ratio of correctly predicted positive observations to the total predicted positive
observations. The question that this metric answer is of all passengers that labelled as survived,
how many actually survived? High precision relates to the low false positive rate. We have got
0.788 precision which is pretty good.
F1Score is the weighted average of Precision and Recall. Therefore, this score takes both false
positives and false negatives into account. F1 is usually more useful than accuracy, especially if
you have an uneven class distribution. Accuracy works best if false positives and false negatives
have similar cost. If the cost of false positives and false negatives are very different, it’s better to
look at both Precision and Recall.
ROC provides comparison between predicted and actual target values in a classification. It
describes the performance of a model with a complete range of classification thresholds. ROC area
varies between 0 and 1 intervals. An increasing value indicates better classification, with an area
of one representing perfect classification.
4. RESULTS AND DISCUSSION
4.1 COMPARISON OF CLASSIFICATION TECHNIQUES
It is quite clear that the algorithm is a perfect classifier if at least on the training data, all
instances were classified correctly as well as all errors are zero. However, it is not the case in
reality. Therefore, we can admit that a best classifier is the algorithm with the maximum of
correctly classified Instances or the minimum of incorrectly Classified Instances table 4 and Fig.
3.
Table 4. Correctly classified and incorrectly classified instances for each algorithm
It is worth mentioning that correctly classified instances is TP+TN, FN+FP is incorrectly
classified instances.
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
10
Figure 3. Correctly classified and incorrectly classified instances for each algorithm
From the previous analysis , It is found PART classification rules has the most correctly classified
instances in the group of classification Rules techniques while REPTree has the most correctly
classified instances in the group of tree classifiers. Best classifier is the algorithm with the
minimum execution time. Decision Table and Part have the maximum execution time with respect
to the Rules based classifiers. While Random forest classifier is the slowest among tree based
classifiers. As shown in table 5 and Fig. 4
Table 5. The running time comparisons with respect to all studied classification techniques
Figure 4. Running Time (second)against name classifier
Table .6. Shows a more detailed performance description via; precision, recall, true- and false
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
11
positive rates, F score and Roc area. All these values are very important for comparing classify.
Table 6 gives the classification criteria values as taken fromWeka.
From table 6 and Figure 5 , Rules.Part is the best in rules classifiers with respect to TPR with
percentage 82.8% and REPtree is the best for all tree classifiers with percentage 79.9%.
Figure 5. Comparison of TPR and FPR for different classification techniques..
Figure. 6. Comparison of Precision, Recall and F- score for the studied classifiers.
F1 score is considered as the most important criteria for efficiency of classification technique. F1
Score is the weighted average of Precision and Recall. Therefore, this score takes both false
positives and false negatives into account. Intuitively it is not as easy to understand as accuracy,
but F1 is usually more useful than accuracy, especially if you have an uneven class distribution.
Accuracy works best if false positives and false negatives have similar cost. If the cost of false
positives and false negatives are very different, it’s better to look at both Precision and Recall. So
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
12
F score is the best criteria for accuracy evaluation among all criteria. In our case, F1 score is the
best for rules.part with respect to rules classifiers with percentage 83.1%
Fig. 7. ROC Area comparison for classification techniques
The Roc area is considered as the second important criteria for the efficiency of classification
technique. It is found from figure 7 that Rules. decision has the best value for ROC area among
the four rule classifiers with value.883 and the tree.j48 has the best value for ROC area among the
four tree classifiers .
4.2 APPLYING APRIORI ALGORITHM
Here for predicting the existing data analysis, Apriori algorithm was used. In WEKA, minimum
support is defined to 70% and minimum confident is 90% and the num Rule is define to 20. The
resultant output show that some important information that also helpful for existing situation
analysis as shown below:S
5. CONCLUSIONS
In this paper, the models performance in terms of classification accuracy for 4 rule-based
algorithms and 4 tree-based algorithms using various accuracy measures like TP rate, F1 score and
ROC area . Accuracy has been measured on the dataset by many criteria of evaluation but the
most importance criteria of accuracy is F1 score and ROC area.
Thus It was found from the comparative analysis that classification Method Rules.PART
performed best with classification accuracy of 83.1.%, DestionTable came out with classification
accuracy of 81.4%, But OneR came out with classification accuracy of 78.4% , Jrip with 79.4%
among all rule-based Classifiers. For Tree-based Classifiers , the Methods REPTree had 80.3%
but method DecisionStump and random forest came out with classification accuracy 78.2% and
J48 with 74.7% algorithms on flight delay dataset in WEKA.
In addition the analysis showing that JRip and OneR , j48 , REPTree and decision stump is the
13. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
13
smallest running time on the studied classifiers with value .01 second. So we concluded from the
previous analysis that REPtree is the most efficient classifier through all the studied classifiers
with respect to accuracy and running time.
The accuracy of predictive model is affected by the selection of attribute. With this we can
conclude that the different classification algorithms are designed to perform better for certain
types of dataset . For the future work, we try to use a dataset with a huge number of instances.
Certainly, it can lead us to manage Big data mining technologies with the aim of achieving the
precise and perfect predictions.
REFERENCE
[1] Mukherjee, A., Grabbe, S. R., & Sridhar, B. (2014). Predicting Ground Delay Program at an airport based on
meteorological conditions. In 14th AIAA Aviation Technology, Integration, and Operations Conference (pp
2713- 2718).
[2] Oza1, S. , Sharma, S. , Sangoi, H. , Raut, R., Kotak,V. C.,( April 2015)(FD Prediction System Using Weighted
Multiple Linear Regression. In International Journal Of Engineering and Computer Science ISSN:23197242
Volume 4, Issue 4 on(11668-11677)
[3] Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D. (2013). Waikato
Environment for Knowledge Analysis (WEKA) Manual for Version 3-7-8. The University of Waikato, Hamilton,
New Zealand.
[4] Sewaiwar, P., & Verma, K. K. (2015). Comparative study of various decision tree classification algorithm using
WEKA. International Journal of Emerging Research in Management Technology, Volume 4, ISSN:2278-9359.
[5] Mukherjee, A., Grabbe, S., and Sridhar, B.,(2013).Classification of Days Based on Weather-Impacted Traffic in
the National Airspace System. In Aviation Technology, Integration, and Ope rations Conference, Los Angeles.
[6] Asencio, M. A.,(2012).Clustering Approach for Analysis of Convective Weather Impacting the NAS. In 12th
Integrated Communications, Navigation, and Surveillance Conference, Herndon, Virginia.
[7] Akpinar, M. and Karabacak ,M. ,(2017). Data mining applications in civil aviation sector: State-of-art review .
[8] Nazeri, Z., Zhang, J. ,(2017). Mining Aviation Data to Understand Impacts of Severe Weather. In Proceedings
of the International Conference on Information Technology: Coding and Computing (ITCC.02).
[9] Man, H.,S. , Jung, N., and Hyun, P., S.,(2015). Analysis of Air-Moving on Schedule Big Data based on Crisp-
Dm Methodology. In ARPN Journal of Engineering and Applied Sciences on(pp. 2088-2091).
[10] Kurniawan, R., Nazri, M. Z. A., Irsyad, M., Yendra, R., & Aklima, A. (2015, August). On machine learning
technique selection for classification. In Electrical Engineering and Informatics (ICEEI), 2015 International
Conference on (pp. 540-545). IEEE.
[11] Pandey, P., & Prabhakar, R. (2016, August). An analysis of machine learning techniques (J48 & AdaBoost)-for
classification. In Information Processing (IICIP), 2016 1st India International Conference on (pp. 1-6). IEEE.
[12] Rahman, M. S., & Waheed, S. (2017, February). Carbon emission measurement in improved cook stove using
data mining. In Electrical, Computer and Communication Engineering (ECCE), International Conference on (pp.
83-86). IEEE.
[13] http://www.cs.tau.ac.il/~fiat/dmsem03/Fast%20Algorithms%20for%20Mining%20Association%20Rules.ppt
[14] Becker, B. G. (1998, October). Visualizing decision table classifiers. In Information Visualization, 1998.
Proceedings. IEEE Symposium on (pp. 102-105). IEEE.
[15] http://www.saedsayad.com/oner.htm
[16] Palanisamy, S. K. (2006). Association rule based classification (Doctoral dissertation, Worcester Polytechnic
Institute).
[17] Bandyopadhyay, R., J. and Guerrero, R. , “Predicting airline delays,” 2012.
[18] https://www.flightstats.com/
14. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.3, May 2018
14
Appendix
WEKA Results