Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
Fuzzy Rule Base System for Software Classificationijcsit
This document describes a fuzzy rule-based system for classifying Java applications using object-oriented metrics. Key features of the system include automatically extracting OO metrics from source code, a configurable set of fuzzy rules, and classifying software at both the application and class level. The system is designed to address limitations of existing OO metric tools by providing an automated, unified analysis and classification without requiring complex post-processing methods. The document outlines the system design, including subsystems for the fuzzy rules engine and extracting OO metrics, and defines membership functions and fuzzy rules for classification.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Genetic fuzzy process metric measurement system for an operating systemijcseit
Operating system (Os) is the most essential software of the computer system,deprived ofit, the computer
system is totally useless. It is the frontier for assessing relevant computer resources. It performance greatly
enhances user overall objective across the system. Related literatures have try in different methods and
techniques to measure the process matric performance of the operating system but none has incorporated
the use of genetic algorithm and fuzzy logic in their varied techniques which indeed is a novel approach.
Extending the work of Michalis, this research focuses on measuring the process matrix performance of an
operating system utilizing set of operating system criteria’s while fusing fuzzy logic to handle
impreciseness and genetic for process optimization.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET Journal
This document describes a neural network-based system for detecting leaf diseases and recommending remedies. It uses a convolutional neural network (CNN) and deep learning techniques to classify images of plant leaves with different diseases. The system is trained on a dataset of 5000 leaf images across 4 disease classes. It aims to help farmers more easily identify leaf diseases and receive treatment recommendations without needing to directly contact experts. The document outlines the existing problems, proposed solution, literature review on related techniques like boosting and support vector machines, software and algorithms used including Python, Anaconda and Spyder. It also describes the implementation process involving modules for data loading, preprocessing, feature extraction using CNN, disease prediction, and recommending remedies.
Study on Relavance Feature Selection MethodsIRJET Journal
This document summarizes research on feature selection methods. It discusses how feature selection is used to reduce dimensionality when working with large datasets that have thousands of variables. Several feature selection algorithms are examined, including ant colony optimization, quadratic programming, variable ranking using filter, wrapper and embedded methods, and fast correlation-based filtering with sequential forward selection. Feature selection can improve classification efficiency and understanding of data by identifying the most meaningful features.
Software size estimation at early stages of project development holds great significance to meet the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for measuring the size of object oriented projects. The class point approach is used to quantify classes which are the logical building blocks in object oriented paradigm. In this paper, we propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class point approach is used to estimate the size of an OLAP System The results of our approach are validated.
Fuzzy Rule Base System for Software Classificationijcsit
This document describes a fuzzy rule-based system for classifying Java applications using object-oriented metrics. Key features of the system include automatically extracting OO metrics from source code, a configurable set of fuzzy rules, and classifying software at both the application and class level. The system is designed to address limitations of existing OO metric tools by providing an automated, unified analysis and classification without requiring complex post-processing methods. The document outlines the system design, including subsystems for the fuzzy rules engine and extracting OO metrics, and defines membership functions and fuzzy rules for classification.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Genetic fuzzy process metric measurement system for an operating systemijcseit
Operating system (Os) is the most essential software of the computer system,deprived ofit, the computer
system is totally useless. It is the frontier for assessing relevant computer resources. It performance greatly
enhances user overall objective across the system. Related literatures have try in different methods and
techniques to measure the process matric performance of the operating system but none has incorporated
the use of genetic algorithm and fuzzy logic in their varied techniques which indeed is a novel approach.
Extending the work of Michalis, this research focuses on measuring the process matrix performance of an
operating system utilizing set of operating system criteria’s while fusing fuzzy logic to handle
impreciseness and genetic for process optimization.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET Journal
This document describes a neural network-based system for detecting leaf diseases and recommending remedies. It uses a convolutional neural network (CNN) and deep learning techniques to classify images of plant leaves with different diseases. The system is trained on a dataset of 5000 leaf images across 4 disease classes. It aims to help farmers more easily identify leaf diseases and receive treatment recommendations without needing to directly contact experts. The document outlines the existing problems, proposed solution, literature review on related techniques like boosting and support vector machines, software and algorithms used including Python, Anaconda and Spyder. It also describes the implementation process involving modules for data loading, preprocessing, feature extraction using CNN, disease prediction, and recommending remedies.
Study on Relavance Feature Selection MethodsIRJET Journal
This document summarizes research on feature selection methods. It discusses how feature selection is used to reduce dimensionality when working with large datasets that have thousands of variables. Several feature selection algorithms are examined, including ant colony optimization, quadratic programming, variable ranking using filter, wrapper and embedded methods, and fast correlation-based filtering with sequential forward selection. Feature selection can improve classification efficiency and understanding of data by identifying the most meaningful features.
Software size estimation at early stages of project development holds great significance to meet the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for measuring the size of object oriented projects. The class point approach is used to quantify classes which are the logical building blocks in object oriented paradigm. In this paper, we propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class point approach is used to estimate the size of an OLAP System The results of our approach are validated.
Software size estimation at early stages of project development holds great significance to meet
the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor
of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for
measuring the size of object oriented projects. The class point approach is used to quantify
classes which are the logical building blocks in object oriented paradigm. In this paper, we
propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries
based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H
benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class
point approach is used to estimate the size of an OLAP System The results of our approach are
validated.
Issues in Query Processing and OptimizationEditor IJMTER
The paper identifies the various issues in query processing and optimization while
choosing the best database plan. It is unlike preceding query optimization techniques that uses only a
single approach for identifying best query plan by extracting data from database. Our approach takes
into account various phases of query processing and optimization, heuristic estimation techniques
and cost function for identifying the best execution plan. A review report on various phases of query
processing, goals of optimizer, various rules for heuristic optimization and cost components involved
are presented in this paper.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
Hybrid Classifier for Sentiment Analysis using Effective PipeliningIRJET Journal
The document describes a hybrid approach for sentiment analysis of tweets that uses a pipeline of rules-based classification, lexicon-based classification, and machine learning classification. Tweets are first classified using rules and a lexicon, and only tweets that do not meet a confidence threshold are passed to a machine learning classifier. The hybrid approach aims to optimize performance, speed, accuracy, and processing requirements compared to using individual classification methods alone. The document provides background on sentiment analysis methods and evaluates the performance of the hybrid approach versus individual classifiers.
A NOVEL APPROACH TO ERROR DETECTION AND CORRECTION OF C PROGRAMS USING MACHIN...IJCI JOURNAL
There has always been a struggle for programmers to identify the errors while executing a program- be it
syntactical or logical error. This struggle has led to a research in identification of syntactical and logical
errors. This paper makes an attempt to survey those research works which can be used to identify errors as
well as proposes a new model based on machine learning and data mining which can detect logical and
syntactical errors by correcting them or providing suggestions. The proposed work is based on use of
hashtags to identify each correct program uniquely and this in turn can be compared with the logically
incorrect program in order to identify errors.
Software Effort Estimation using Neuro Fuzzy Inference System: Past and Presentrahulmonikasharma
Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.
IRJET- Attribute Based Adaptive Evaluation SystemIRJET Journal
The document proposes an attribute-based adaptive evaluation system to improve candidate assessment. It analyzes questions using text mining to determine which attributes they test. Candidate responses are evaluated using fuzzy logic to generate outcomes based on attribute levels. This provides a more precise evaluation of a candidate's abilities than traditional tests. The system analyzes questions to assign attributes and levels. It records responses and analyzes them to determine a candidate's attribute levels, providing a comprehensive skills assessment. The proposed system aims to offer more effective candidate evaluation than current methods.
Regression test selection model: a comparison between ReTSE and pythiaTELKOMNIKA JOURNAL
As software systems change and evolve over time regression tests have to be run to validate these changes. Regression testing is an expensive but essential activity in software maintenance. The purpose of this paper is to compare a new regression test selection model called ReTSE with Pythia. The ReTSE model uses decomposition slicing in order to identify the relevant regression tests. Decomposition slicing provides a technique that is capable of identifying the unchanged parts of a system. Pythia is a regression test selection technique based on textual differencing. Both techniques are compare using a Power program taken from Vokolos and Frankl’s paper. The analysis of this comparison has shown promising results in reducing the number of tests to be run after changes are introduced.
BIO-INSPIRED MODELLING OF SOFTWARE VERIFICATION BY MODIFIED MORAN PROCESSESIJCSEA Journal
A new approach for the control and prediction of verification activities for large safety-relevant software
systems will be presented in this paper. The model is applied on a macroscopic system level and based on
so-called Moran processes, which originate from mathematical biology and allow for the description
ofphenomena as, for instance, genetic drift. Beside the theoretical foundations of this novel approach, its
application on a real-world example from the medical engineering domain will be discussed.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
Abstract Today, we are getting plenty of bugs in the software because of variations in the software and hardware technologies. Bugs are nothing but Software faults, existing a severe challenge for system reliability and dependability. To identify the bugs from the software bug prediction is convenient approach. To visualize the presence of a bug in a source code file, recently, Machine learning classifiers approach is developed. Because of a huge number of machine learned features current classifier-based bug prediction have two major problems i) inadequate precision for practical usage ii) measured prediction time. In this paper we used two techniques first, cos-triage algorithm which have a go to enhance the accuracy and also lower the price of bug prediction and second, feature selection methods which eliminate less significant features. Reducing features get better the quality of knowledge extracted and also boost the speed of computation. Keywords: Efficiency, Bug Prediction, Classification, Feature Selection, Accuracy
Testing embedded system through optimal mining technique (OMT) based on multi...IJECEIAES
Testing embedded systems must be done carefully particularly in the significant regions of the embedded systems. Inputs from an embedded system can happen in multiple order and many relationships can exist among the input sequences. Consideration of the sequences and the relationships among the sequences is one of the most important considerations that must be tested to find the expected behavior of the embedded systems. On the other hand combinatorial approaches help determining fewer test cases that are quite enough to test the embedded systems exhaustively. In this paper, an Optimal Mining Technique that considers multi-input domain which is based on built-in combinatorial approaches has been presented. The method exploits multi-input sequences and the relationships that exist among multi-input vectors. The technique has been used for testing an embedded system that monitors and controls the temperature within the Nuclear reactors.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Bio-Inspired Modelling of Software Verification by Modified Moran ProcessesIJCSEA Journal
A new approach for the control and prediction of verification activities for large safety-relevant software systems will be presented in this paper. The model is applied on a macroscopic system level and based on so-called Moran processes, which originate from mathematical biology and allow for the description of phenomena as, for instance, genetic drift. Beside the theoretical foundations of this novel approach, its application on a real-world example from the medical engineering domain will be discussed.
This document provides summaries of the most cited articles from the International Journal of Computer Science & Information Technology (IJCSIT) from 2009 to 2012. It lists the article title, authors, publication details, and a short description or focus of each of the most cited articles from each year. There are over 50 highly cited articles summarized from the journal during this four year period covering topics in various areas of computer science and information technology.
Study On Decision-Making For Café Management Alternativesijcsit
In recent years, many new cafés have emerged onto the market. Other than view cafés, beautiful cafés that
seem as if they came from Paris or New York have gradually appeared in leisure and quiet residential
areas, in alleyways, in peripheral areas, and in local commercial areas. In particular, leisure is trendy at
present, and modern restaurants innovate in terms of their food, leisure, and consumption. Unlike
traditional restaurants, they are able to develop into cafés with unique styles to attract consumers. Even
though not all of these new cafés are successful, as cafés are an industry that is at the forefront of fashion,
many individuals who dream of entrepreneurship would want to open a café. However, as there are many
types of cafés on the market, what type and style of cafés are the most suitable? An overview of cafés in
Taiwan shows that each café offer unique services and functions to attract consumers, which is the key to
sustainable operations of cafés. Therefore, this study explores the decisions of companies when choosing
the style for their cafés. This study uses the analytical hierarchy process (AHP) to explore the selection of
café styles, in order to provide references for café operators to achieve successful and sustainable
operations. Based on literature review, expert interviews, and AHP, this study intends to provide useful
results to the operators of cafés
P ERFORMANCE M EASUREMENTS OF F EATURE T RACKING AND H ISTOGRAM BASED T ...ijcsit
In this paper, feature tracking based and histogram
based traffic congestion detection systems are
developed. Developed all system are designed to run
as real time application. In this work, ORB (Orien
ted
FAST and Rotated BRIEF) feature extraction method h
ave been used to develop feature tracking based
traffic congestion solution. ORB is a rotation inva
riant, fast and resistant to noise method and conta
ins the
power of FAST and BRIEF feature extraction methods.
Also, two different approaches, which are standard
deviation and weighed average, have been applied to
find out the congestion information by using
histogram of the image to develop histogram based t
raffic congestion solution. Both systems have been
tested on different weather conditions such as clou
dy, sunny and rainy to provide various illumination
at
both daytime and night. For all developed systems p
erformance results are examined to show the
advantages and drawbacks of these systems.
Performance and power comparisons between nvidia and ati gpusijcsit
In recent years, modern graphics processing units have been widely adopted in high performance computing
areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have
introduced series of products to the market. While sharing many similar design concepts, GPUs from these
two manufacturers differ in several aspects on processor cores and the memory subsystem. In this paper,
we choose two recently released products respectively from Nvidia and ATI and investigate the architectural
differences between them. Our results indicate that these two products have diverse advantages that
are reflected in their performance for different sets of applications. In addition, we also compare the energy
efficiencies of these two platforms since power/energy consumption is a major concern in the high
performance computing system.
Software size estimation at early stages of project development holds great significance to meet
the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor
of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for
measuring the size of object oriented projects. The class point approach is used to quantify
classes which are the logical building blocks in object oriented paradigm. In this paper, we
propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries
based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H
benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class
point approach is used to estimate the size of an OLAP System The results of our approach are
validated.
Issues in Query Processing and OptimizationEditor IJMTER
The paper identifies the various issues in query processing and optimization while
choosing the best database plan. It is unlike preceding query optimization techniques that uses only a
single approach for identifying best query plan by extracting data from database. Our approach takes
into account various phases of query processing and optimization, heuristic estimation techniques
and cost function for identifying the best execution plan. A review report on various phases of query
processing, goals of optimizer, various rules for heuristic optimization and cost components involved
are presented in this paper.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
Hybrid Classifier for Sentiment Analysis using Effective PipeliningIRJET Journal
The document describes a hybrid approach for sentiment analysis of tweets that uses a pipeline of rules-based classification, lexicon-based classification, and machine learning classification. Tweets are first classified using rules and a lexicon, and only tweets that do not meet a confidence threshold are passed to a machine learning classifier. The hybrid approach aims to optimize performance, speed, accuracy, and processing requirements compared to using individual classification methods alone. The document provides background on sentiment analysis methods and evaluates the performance of the hybrid approach versus individual classifiers.
A NOVEL APPROACH TO ERROR DETECTION AND CORRECTION OF C PROGRAMS USING MACHIN...IJCI JOURNAL
There has always been a struggle for programmers to identify the errors while executing a program- be it
syntactical or logical error. This struggle has led to a research in identification of syntactical and logical
errors. This paper makes an attempt to survey those research works which can be used to identify errors as
well as proposes a new model based on machine learning and data mining which can detect logical and
syntactical errors by correcting them or providing suggestions. The proposed work is based on use of
hashtags to identify each correct program uniquely and this in turn can be compared with the logically
incorrect program in order to identify errors.
Software Effort Estimation using Neuro Fuzzy Inference System: Past and Presentrahulmonikasharma
Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.
IRJET- Attribute Based Adaptive Evaluation SystemIRJET Journal
The document proposes an attribute-based adaptive evaluation system to improve candidate assessment. It analyzes questions using text mining to determine which attributes they test. Candidate responses are evaluated using fuzzy logic to generate outcomes based on attribute levels. This provides a more precise evaluation of a candidate's abilities than traditional tests. The system analyzes questions to assign attributes and levels. It records responses and analyzes them to determine a candidate's attribute levels, providing a comprehensive skills assessment. The proposed system aims to offer more effective candidate evaluation than current methods.
Regression test selection model: a comparison between ReTSE and pythiaTELKOMNIKA JOURNAL
As software systems change and evolve over time regression tests have to be run to validate these changes. Regression testing is an expensive but essential activity in software maintenance. The purpose of this paper is to compare a new regression test selection model called ReTSE with Pythia. The ReTSE model uses decomposition slicing in order to identify the relevant regression tests. Decomposition slicing provides a technique that is capable of identifying the unchanged parts of a system. Pythia is a regression test selection technique based on textual differencing. Both techniques are compare using a Power program taken from Vokolos and Frankl’s paper. The analysis of this comparison has shown promising results in reducing the number of tests to be run after changes are introduced.
BIO-INSPIRED MODELLING OF SOFTWARE VERIFICATION BY MODIFIED MORAN PROCESSESIJCSEA Journal
A new approach for the control and prediction of verification activities for large safety-relevant software
systems will be presented in this paper. The model is applied on a macroscopic system level and based on
so-called Moran processes, which originate from mathematical biology and allow for the description
ofphenomena as, for instance, genetic drift. Beside the theoretical foundations of this novel approach, its
application on a real-world example from the medical engineering domain will be discussed.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
Abstract Today, we are getting plenty of bugs in the software because of variations in the software and hardware technologies. Bugs are nothing but Software faults, existing a severe challenge for system reliability and dependability. To identify the bugs from the software bug prediction is convenient approach. To visualize the presence of a bug in a source code file, recently, Machine learning classifiers approach is developed. Because of a huge number of machine learned features current classifier-based bug prediction have two major problems i) inadequate precision for practical usage ii) measured prediction time. In this paper we used two techniques first, cos-triage algorithm which have a go to enhance the accuracy and also lower the price of bug prediction and second, feature selection methods which eliminate less significant features. Reducing features get better the quality of knowledge extracted and also boost the speed of computation. Keywords: Efficiency, Bug Prediction, Classification, Feature Selection, Accuracy
Testing embedded system through optimal mining technique (OMT) based on multi...IJECEIAES
Testing embedded systems must be done carefully particularly in the significant regions of the embedded systems. Inputs from an embedded system can happen in multiple order and many relationships can exist among the input sequences. Consideration of the sequences and the relationships among the sequences is one of the most important considerations that must be tested to find the expected behavior of the embedded systems. On the other hand combinatorial approaches help determining fewer test cases that are quite enough to test the embedded systems exhaustively. In this paper, an Optimal Mining Technique that considers multi-input domain which is based on built-in combinatorial approaches has been presented. The method exploits multi-input sequences and the relationships that exist among multi-input vectors. The technique has been used for testing an embedded system that monitors and controls the temperature within the Nuclear reactors.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Bio-Inspired Modelling of Software Verification by Modified Moran ProcessesIJCSEA Journal
A new approach for the control and prediction of verification activities for large safety-relevant software systems will be presented in this paper. The model is applied on a macroscopic system level and based on so-called Moran processes, which originate from mathematical biology and allow for the description of phenomena as, for instance, genetic drift. Beside the theoretical foundations of this novel approach, its application on a real-world example from the medical engineering domain will be discussed.
This document provides summaries of the most cited articles from the International Journal of Computer Science & Information Technology (IJCSIT) from 2009 to 2012. It lists the article title, authors, publication details, and a short description or focus of each of the most cited articles from each year. There are over 50 highly cited articles summarized from the journal during this four year period covering topics in various areas of computer science and information technology.
Study On Decision-Making For Café Management Alternativesijcsit
In recent years, many new cafés have emerged onto the market. Other than view cafés, beautiful cafés that
seem as if they came from Paris or New York have gradually appeared in leisure and quiet residential
areas, in alleyways, in peripheral areas, and in local commercial areas. In particular, leisure is trendy at
present, and modern restaurants innovate in terms of their food, leisure, and consumption. Unlike
traditional restaurants, they are able to develop into cafés with unique styles to attract consumers. Even
though not all of these new cafés are successful, as cafés are an industry that is at the forefront of fashion,
many individuals who dream of entrepreneurship would want to open a café. However, as there are many
types of cafés on the market, what type and style of cafés are the most suitable? An overview of cafés in
Taiwan shows that each café offer unique services and functions to attract consumers, which is the key to
sustainable operations of cafés. Therefore, this study explores the decisions of companies when choosing
the style for their cafés. This study uses the analytical hierarchy process (AHP) to explore the selection of
café styles, in order to provide references for café operators to achieve successful and sustainable
operations. Based on literature review, expert interviews, and AHP, this study intends to provide useful
results to the operators of cafés
P ERFORMANCE M EASUREMENTS OF F EATURE T RACKING AND H ISTOGRAM BASED T ...ijcsit
In this paper, feature tracking based and histogram
based traffic congestion detection systems are
developed. Developed all system are designed to run
as real time application. In this work, ORB (Orien
ted
FAST and Rotated BRIEF) feature extraction method h
ave been used to develop feature tracking based
traffic congestion solution. ORB is a rotation inva
riant, fast and resistant to noise method and conta
ins the
power of FAST and BRIEF feature extraction methods.
Also, two different approaches, which are standard
deviation and weighed average, have been applied to
find out the congestion information by using
histogram of the image to develop histogram based t
raffic congestion solution. Both systems have been
tested on different weather conditions such as clou
dy, sunny and rainy to provide various illumination
at
both daytime and night. For all developed systems p
erformance results are examined to show the
advantages and drawbacks of these systems.
Performance and power comparisons between nvidia and ati gpusijcsit
In recent years, modern graphics processing units have been widely adopted in high performance computing
areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have
introduced series of products to the market. While sharing many similar design concepts, GPUs from these
two manufacturers differ in several aspects on processor cores and the memory subsystem. In this paper,
we choose two recently released products respectively from Nvidia and ATI and investigate the architectural
differences between them. Our results indicate that these two products have diverse advantages that
are reflected in their performance for different sets of applications. In addition, we also compare the energy
efficiencies of these two platforms since power/energy consumption is a major concern in the high
performance computing system.
NEW SYMMETRIC ENCRYPTION SYSTEM BASED ON EVOLUTIONARY ALGORITHMijcsit
In this article, we present a new symmetric encryption system which is a combination of our ciphering evolutionary system SEC [1] and a new ciphering method called “fragmentation”. This latter allows the alteration of the appearance frequencies of characters from a given text. Our system has at its disposed two keys, the first one is generated by the evolutionary algorithm, the second one is generated after “fragmentation” part. Both of them are symmetric, session keys and strengthening the security of our
system.
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
This document discusses how information and communication technologies (ICT) are enabling a "quantum leap" in Africa's development. It describes ICT's potential to reach communities, provide economic opportunities, and reduce social and geographic barriers. Examples are given from the 2007 World Information Technology Forum for Africa, showing multi-stakeholder projects using ICT in areas like agriculture, education, health, and e-government. The International Federation for Information Processing's role in convening stakeholders to discuss ICT policies and share experiences is also outlined. While challenges remain around follow-through, partnerships, and policy harmonization, increasing cooperation and practical implementations of commitments show promise for ICT to significantly advance Africa's development.
SFAMSS:A S ECURE F RAMEWORK F OR ATM M ACHINES V IA S ECRET S HARINGijcsit
As ATM applications deploy for a banking system, th
e need to secure communications will become critica
l.
However, multicast protocols do not fit the point-t
o-point model of most network security protocols wh
ich
were designed with unicast communications in mind.
In recent years, we have seen the emergence and the
growing of ATMs (Automatic Teller Machines) in bank
ing systems. Many banks are extending their activit
y
and increasing transactions by using ATMs. ATM will
allow them to reach more customers in a cost
effective way and to make their transactions fast a
nd efficient. However, communicating in the network
must satisfy integrity, privacy, confidentiality, a
uthentication and non-repudiation. Many frameworks
have
been implemented to provide security in communicati
on and transactions. In this paper, we analyze ATM
communication protocol and propose a novel framewor
k for ATM systems that allows entities communicate
in a secure way without using a lot of storage. We
describe the architecture and operation of SFAMSS i
n
detail. Our framework is implemented with Java and
the software architecture, and its components are
studied in detailed.
The document discusses coating active cough syrup ingredients in sugar to mask the bitter taste and prevent accidental overdoses in children. It suggests coating the tongue with sugar will cover most of the bitter taste while portioning the cough syrup into bite-sized gum-like pieces ensures the recommended dosage is not exceeded. The goal is to provide effective relief of cough symptoms while creating a pleasant experience for children and adults through taste and measured dosing.
Systems variability modeling a textual model mixing class and feature conceptsijcsit
System’s reusability and cost are very important in software product line design area. Developers’ goal is
to increase system reusability and decreasing cost and efforts for building components from scratch for
each software configuration. This can be reached by developing software product line (SPL). To handle
SPL engineering process, several approaches with several techniques were developed. One of these
approaches is called separated approach. It requires separating the commonalities and variability for
system’s components to allow configuration selection based on user defined features. Textual notationbased
approaches have been used for their formal syntax and semantics to represent system features and
implementations. But these approaches are still weak in mixing features (conceptual level) and classes
(physical level) that guarantee smooth and automatic configuration generation for software releases. The
absence of methodology supporting the mixing process is a real weakness. In this paper, we enhanced
SPL’s reusability by introducing some meta-features, classified according to their functionalities. As a first
consequence, mixing class and feature concepts is supported in a simple way using class interfaces and
inherent features for smooth move from feature model to class model. And as a second consequence, the
mixing process is supported by a textual design and implementation methodology, mixing class and feature
models by combining their concepts in a single language. The supported configuration generation process
is simple, coherent, and complete.
Emotional and behavioral problems of young people have always been an important issue. Emotions could
be changed through learning and maturing. Therefore, the emotions of young people can be changed
through counselling courses. This research uses experimental research methods, multimedia education and
the Beck Youth Inventories-Second Edition as research tools. The samples of this research are 60 eighth
grade students. The teaching experiment investigates the students' effectiveness of immediate and
procrastinated emotion counselling after the use of multimedia learning. The research showed that there is
no significant difference between the students of the experimental group and the control group in
immediate counselling results. On procrastinated emotion counselling, there is a significant effectiveness in
depression and anxiety. On emotional stability, the students of experimental group is better than the control
group.
Assessing Software Reliability Using SPC – An Order Statistics Approach IJCSEA Journal
There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do asymptotic likelihood inference for software reliability models based on order statistics or Non-Homogeneous Poisson Processes (NHPP), with asymptotic confidence levels for interval estimates of parameters. In particular, interval estimates from these models are obtained for the conditional failure rate of the software, given the data from the debugging process. The data can be grouped or ungrouped. For someone making a decision about when to market software, the conditional failure rate is an important parameter. Order statistics are used in a wide variety of practical situations. Their use in characterization problems, detection of outliers, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many books. Statistical Process Control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper we proposed a control mechanism based on order statistics of cumulative quantity between observations of time domain
failure data using mean value function of Half Logistics Distribution (HLD) based on NHPP.
Assessing Software Reliability Using SPC – An Order Statistics ApproachIJCSEA Journal
There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do asymptotic likelihood inference for software reliability models based on order statistics or Non-Homogeneous Poisson Processes (NHPP), with asymptotic confidence levels for interval estimates of parameters. In particular, interval estimates from these models are obtained for the conditional failure rate of the software, given the data from the debugging process. The data can be grouped or ungrouped. For someone making a decision about when to market software, the conditional failure rate is an important parameter. Order statistics are used in a wide variety of practical situations. Their use in characterization problems, detection of outliers, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many books. Statistical Process Control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper we proposed a control mechanism based on order statistics of cumulative quantity between observations of time domain
failure data using mean value function of Half Logistics Distribution (HLD) based on NHPP.
Development and Evaluation of an Employee Performance Appraisal Insight Repor...IRJET Journal
This document describes research on developing a software tool called E-Perform that uses fuzzy inference techniques and natural language generation to generate narrative employee performance evaluation reports. The software allows managers to rate employees on various criteria. It then analyzes the data and generates insight reports on each employee's strengths, weaknesses and recommendations. The reports are evaluated for correctness and the software's acceptability is assessed through surveys. Results found the reports were 60.9% correct on average and the software was deemed useful, functional and user-friendly by managers. It is recommended to enhance the software with natural language processing to allow open-ended evaluator comments.
One of the core quality assurance feature which combines fault prevention and fault detection, is often known as testability approach also. There are many assessment techniques and quantification method evolved for software testability prediction which actually identifies testability weakness or factors to further help reduce test effort. This paper examines all those measurement techniques that are being proposed for software testability assessment at various phases of object oriented software development life cycle. The aim is to find the best metrics suit for software quality improvisation through software testability support. The ultimate objective is to establish the ground work for finding ways reduce the testing effort by improvising software testability and its assessment using well planned guidelines for object-oriented software development with the help of suitable metrics.
Artificial intelligence based pattern recognition is
one of the most important tools in process control to identify
process problems. The objective of this study was to
evaluate the relative performance of a feature-based
Recognizer compared with the raw data-based recognizer.
The study focused on recognition of seven commonly
researched patterns plotted on the quality chart. The
artificial intelligence based pattern recognizer trained using
the three selected statistical features resulted in significantly
better performance compared with the raw data-based
recognizer.
ANALYZABILITY METRIC FOR MAINTAINABILITY OF OBJECT ORIENTED SOFTWARE SYSTEMIAEME Publication
Analyzability is one of the major factors in the prediction of maintainability aspect that can improve the quality of the intended software solution in an appropriate manner. In fact, the right analyzability measure will go a long way in decreasing the inadequacies and deficiencies through identification of the modified parts and failure causes in the software system. The analyzability metric identification is not possible in every software solution because of its non- functional nature in the real world, but in object-oriented software system there is an opportunity to find out the analyzability metric in the form of class inheritance hierarchies. In this research paper, the researcher measured the analyzability factor for the object-oriented software systems and also validated in accordance with the famed weyker’s properties. The proposed analyzability metric here is intended to lead the new developments in object-oriented software maintainability parameter in future and help the new researchers do their research the right way thus eventually achieving the quality aspect of the objectoriented software system and fulfilling the needs of the users.
A Systematic Mapping Review of Software Quality Measurement: Research Trends,...IJECEIAES
Software quality is a key for the success in the business of information and technology. Hence, before be marketed, it needs the software quality measurement to fulfill the user requirements. Some methods of the software quality analysis have been tested in a different perspective, and we have presented the software method in the point of view of users and experts. This study aims to map the method of software quality measurement in any models of quality. Using the method of Systematic Mapping Study, we did a searching and filtering of papers using the inclusion and exclusion criteria. 42 relevant papers have been obtained then. The result of the mapping showed that though the model of ISO SQuaRE has been widely used since the last five years and experienced the dynamics, the researchers in Indonesia still used ISO9126 until the end of 2016.The most commonly used method of the software quality measurement Method is the empirical method, and some researchers have done an AHP and Fuzzy approach in measuring the software quality.
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
Effective software cost estimation is the most challenging and important activities in software development. Developers want a simple and accurate method of efforts estimation. Estimation of the cost before starting of work is a prediction and prediction always not accurate. Software effort estimation is a very critical task in the software engineering and to control quality and efficiency a suitable estimation technique is crucial. This paper gives a review of various available software effort estimation methods, mainly focus on the algorithmic model and non algorithmic model. These existing methods for software cost estimation are illustrated and their aspect will be discussed. No single technique is best for all situations, and thus a careful comparison of the results of several approaches is most likely to produce realistic estimation. This paper provides a detailed overview of existing software cost estimation models and techniques. This paper presents the strength and weakness of various cost estimation methods. This paper focuses on some of the relevant reasons that cause inaccurate estimation. Pa Pa Win | War War Myint | Hlaing Phyu Phyu Mon | Seint Wint Thu "Review on Algorithmic and Non-Algorithmic Software Cost Estimation Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26511.pdfPaper URL: https://www.ijtsrd.com/engineering/-/26511/review-on-algorithmic-and-non-algorithmic-software-cost-estimation-techniques/pa-pa-win
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
Diabetes disease is amongst the most common disease in India. It affects patient’s health and also leads to
other chronic diseases. Prediction of diabetes plays a significant role in saving of life and cost. Predicting
diabetes in human body is a challenging task because it depends on several factors. Few studies have reported the performance of classification algorithms in terms of accuracy. Results in these studies are difficult and complex to understand by medical practitioner and also lack in terms of visual aids as they arepresented in pure text format. This reported survey uses ROC and PRC graphical measures toimproveunderstanding of results. A detailed parameter wise discussion of comparison is also presented which lacksin other reported surveys. Execution time, Accuracy, TP Rate, FP Rate, Precision, Recall, F Measureparameters are used for comparative analysis and Confusion Matrix is prepared for quick review of each
algorithm. Ten fold cross validation method is used for estimation of prediction model. Different sets of
classification algorithms are analyzed on diabetes dataset acquired from UCI repository
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
Today, Software measurement are based on various techniques such that neural network, Genetic
algorithm, Fuzzy Logic etc. This study involves the efficiency of applying support vector machine using
Gaussian Radial Basis kernel function to software measurement problem to increase the performance and
accuracy. Support vector machines (SVM) are innovative approach to constructing learning machines that
Minimize generalization error. There is a close relationship between SVMs and the Radial Basis Function
(RBF) classifiers. Both have found numerous applications such as in optical character recognition, object
detection, face verification, text categorization, and so on. The result demonstrated that the accuracy and
generalization performance of SVM Gaussian Radial Basis kernel function is better than RBFN. We also
examine and summarize the several superior points of the SVM compared with RBFN.
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
This document discusses various techniques for software data reduction to improve the accuracy of bug triage. It first provides background on bug triage and the challenges it aims to address like large volumes of low quality bug data. It then surveys literature on related techniques like automated test generation and text mining approaches. The document describes various text mining methods like term-based, phrase-based, concept-based and pattern taxonomy methods. It also covers data reduction techniques and their benefits for bug triage. Different classification techniques for bug identification are explained, including decision trees, nearest neighbor classifier and artificial neural networks.
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
Software cost estimation is an important task in the software design and development process.
Planning and budgeting tasks are carried out with reference to the software cost values. A variety of
software properties are used in the cost estimation process. Hardware, products, technology and
methodology factors are used in the cost estimation process. The software cost estimation quality is
measured with reference to the accuracy levels.
Software cost estimation is carried out using three types of techniques. They are regression based
model, anology based model and machine learning model. Each model has a set of technique for the
software cost estimation process. 11 cost estimation techniques fewer than 3 different categories are
used in the system. The Attribute Relational File Format (ARFF) is used maintain the software product
property values. The ARFF file is used as the main input for the system.
The proposed system is designed to perform the clustering and ranking of software cost
estimation methods. Non overlapped clustering technique is enhanced with optimal centroid estimation
mechanism. The system improves the clustering and ranking process accuracy. The system produces
efficient ranking results on software cost estimation methods.
This document discusses the impact of aspect-oriented programming (AOP) on software maintainability based on a literature review and case studies. It summarizes several case studies that measured maintainability metrics like coupling, cohesion, and separation of concerns in object-oriented (OO) systems versus aspect-oriented (AO) systems. The studies found that AO systems generally had less coupling between components, higher separation of concerns, and were more changeable and maintainable than equivalent OO systems. The document also outlines various software metrics that have been used to measure maintainability attributes in AO systems like cohesion, coupling, size, and changeability.
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMijcseit
This document presents a genetic-fuzzy system for measuring the performance of an operating system's processes. It develops a model using 7 key operating system process parameters and fuzzy logic to handle imprecision. A genetic algorithm is used to optimize the generated membership functions. Rules are created relating parameter combinations to performance classifications. The system was tested on sample data and the genetic algorithm was able to optimize the membership functions over 4 generations to best classify performance. The system brings an optimal and precise approach to measuring operating system process performance by combining genetic algorithms and fuzzy logic.
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMijcseit
Operating system (Os) is the most essential software of the computer system,deprived ofit, the computer system is totally useless. It is the frontier for assessing relevant computer resources. It performance greatly
enhances user overall objective across the system. Related literatures have try in different methods and techniques to measure the process matric performance of the operating system but none has incorporated the use of genetic algorithm and fuzzy logic in their varied techniques which indeed is a novel approach. Extending the work of Michalis, this research focuses on measuring the process matrix performance of an
operating system utilizing set of operating system criteria’s while fusing fuzzy logic to handle impreciseness and genetic for process optimization.
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
The work is about using Simulated Annealing Algorithm for the effort estimation model parameter
optimization which can lead to the reduction in the difference in actual and estimated effort used in model
development.
The model has been tested using OOP’s dataset, obtained from NASA for research purpose.The data set
based model equation parameters have been found that consists of two independent variables, viz. Lines of
Code (LOC) along with one more attribute as a dependent variable related to software development effort
(DE). The results have been compared with the earlier work done by the author on Artificial Neural
Network (ANN) and Adaptive Neuro Fuzzy Inference System (ANFIS) and it has been observed that the
developed SA based model is more capable to provide better estimation of software development effort than
ANN and ANFIS
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEWijseajournal
Software Maintainability is one of the most important quality attributes. To increase quality of a software,
to manage software more efficient and to decrease cost of the software, maintainability, maintainability
estimation and maintainability evaluation models have been proposed. However, the practical use of these
models in software engineering tools and practice remained little due to their limitations or threats to
validity. In this paper, results of our Literature Review about maintainability models, maintainability
metrics and maintainability estimation are presented. Aim of this paper is providing a baseline for further
searches and serving the needs of developers and customers.
This document analyzes and compares maintainability metrics for aspect-oriented software (AOS) and object-oriented software (OOS) using five projects. It discusses metrics like number of children, depth of inheritance tree, lack of cohesion of methods, weighted methods per class, and lines of code. The results show that for most metrics like NOC, DIT, LCOM, and WMC, the mean values are higher for OOS compared to AOS, indicating that AOS is generally more maintainable based on these metrics. LOC is also lower on average for AOS. The study concludes that an AOP version is more maintainable than an OOP version according to the chosen metrics.
The modern business environment requires organizations to be flexible and open to change if they are to gain and retain their competitive age. Competitive business environment needs to modernize existing legacy system in to self-adaptive ones. Reengineering presents an approach to transfer a legacy system towards an evolvable system. Software reengineering is a leading system evolution technique which helps in effective cost control, quality improvements and time and risk reduction. However successful improvement of legacy system through reengineering requires portfolio analysis of legacy application around various quality and functional parameters some of which includes reliability and modularity of the functions, level of usability and maintainability as well as policy and standards of software architecture and availability of required documents. Portfolio analysis around these parameters will help to examine the legacy application on quality and functional gaps within the application [1].
Similar to SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS (20)
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
DOI:10.5121/ijcsit.2015.7404 45
SENSITIVITY ANALYSIS OF INFORMATION
RETRIEVAL METRICS
Marina Marjanović-Jakovljević
Department of Computer Engineering, Singidunum University, Belgrade, Serbia
ABSTRACT
Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR)
systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of
those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance
judgment which is not usually very reliable. We show that position shifting downwards of the relevant
document within the ranked list is followed by Average Precision decreasing. The variation of Average
Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this
variation is negligible. In addition, we try to estimate the regularity of the Average Precision value
changes, when we assume that we are switching the arbitrary number of relevance judgments within the
existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the
end of the 20 document list, have approximately same Average Precision value as a single relevant
document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the
document position in the list. Also, we show that in the case of Serbian-to-English human translation query
followed by English-to-Serbian machine translation, relevance judgment is significantly changed and
therefore, all the parameters for measuring the IR system performance are also subject to change.
KEYWORDS
Information Retrieval (IR) Systems, query, Ranking, Precision, Average Precision
1. INTRODUCTION
Nowadays, following the constant development of information technologies, a significant amount
of information exists and is available to everyone. It has been estimated that there is 55.5 % of
documents in English on the Web. However, the English language is not the native language to
71.5 % of users [1]. On the other hand, the availability of the Internet has created wide
opportunities for transcribing and translating the works of others, which seriously threatens the
system of social values [2]. These facts create the necessity for Information Retrieval (IR) system
which is capable of effective evaluation studies Cross Language IR, especially between
English and other languages. Therefore, it is of essential importance to evaluate the effectiveness
of the IR system. To assess the efficiency, it is necessary to establish suitable success criteria that
can be measured in some way. Evaluation is important for designing, development and
maintenance of IR system efficiency.
These may include performance assessment of IR system. Based on existing literature, it can be
concluded that the evaluation of IR systems is the subject of research in the last 50 years [3]. This
is due to the fact that feasibility study includes parameter such as customer satisfaction,
laboratory results, and the results of operational tests, as it was discussed in [4] and [5].
The traditional way of measuring the effectiveness of IR system consists of a user assessment
about relevance of particular document for a given query. This approach was influenced by
Information Retrieval Community for IR algorithm development and TREC Conferences held in
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
46
the USA, the third-TREC3 [6], the fourth-TREC4 [7] and the sixth-TREC-6 [8]. The algorithms
developed at these events were based on the measurement of efficiency in a controlled
experimental environment. The effectiveness of the IR system is estimated based on the speed of
returning results as well as the necessary memory required to store the index. Measuring the
effectiveness and efficiency of IR is usually carried out in the laboratory with minimal
involvement of users and was based on the assessment of the results of completed algorithms.
Robertson in [5] indicates that customer rating is also necessary in this evaluation in order to
qualitatively estimate IR system performance. This user-oriented approach implies a state of users
and their interaction with the IR system.
In practice, it is common to use different evaluation approaches that will be used during the
development of IR systems, such as the use of the test collection for development, optimization
algorithms for search, and laboratory experiments that involve the interaction of users who are
involved in improving the user interface.
In Section II, we present a general IR system model. In Section III, we discuss measurement of
the efficiency of the IR test collections that we use for the purpose of this paper as well as the
parameters used to measure the quality of IR system. In this section, using Mathematical analysis,
we show some interesting properties of those parameters. In Chapter IV, the criterion for the
evaluation of relevant and irrelevant documents and the valuation methodology for the purpose
experiment are presented. Finally, Section V concludes the results of this paper and gives some
ideas about work.
2. IR SYSTEM MODEL
According to the model described in [9], in Figure 1, Flowchart system model is shown. The set
of queries belongs to one reference corpus, while the set of original documents can belong to the
other reference corpus. In the next stage, query document is divided into subdocuments and
translated into the language of the documents collection. Furthermore, in the stage of Pre-
processing, in order to shorten the time of analysis and make better IR system performance, the
processes of stemming, stopword removal and Part of Speech (POS) tagging should be applied.
Since the queries documents for experiments described in this paper are written in Serbian
language, pre-processing should be adjusted to this language.
The classic stemmer for the Serbian language, where suffix-stripping is performed, is shown in
[10] and [11].
The tagger for the Serbian language is described in [12], where the strategy to unify each separate
case under a general rule was developed.
On the other side, the document from collection is submitted to indexing, where it receives its
identification. Following the identification and Pre-processing stage, both documents are included
in the Information Retrieval system adapted to the Serbian language in order to deal with Serbian
alphabet and whose task is to estimate the degree to which documents in the collection reflect the
information in a user query. Two methods for encoding Serbian characters are recommended in
[13]:
ISO-8859-5, Cyrillic-based and suited for Eastern European languages (Bulgarian, Byelorussian,
Macedonian, Russian, Serbian, and Ukrainian) and ISO-8859-2, suited for European languages
(Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak,
Slovenian, and Serbian).
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
47
Figure 1. System Model Flowchart
3. EVALUATION OF IR SYSTEM
For the measurement of ad-hoc information of search efficiency in a standard way, we must have
a test collection consisting of three things:
A document collection, a test suite of information needs (expressible as queries), and relevance
judgment for each query document pair.
As far as the information needs of users, a document in the collection should give a classification
whether it is relevant or not. This decision is of great importance when the parameters for a
quality measure of IR systems are calculated.
3.1. Test collection
In order to measure the efficiency of IR systems, it is necessary to have the appropriate test
collection. There are collections such as TREC-3 [6], TREC-4 [7] for the Spanish language,
TREC-6 [8], for Cross Language French and English, ECLAPA [9] for Cross Language
Portuguese and English. In [14], testing for query collection written in Serbian language was done
by EBART 3.
However, many IR systems contain different parameters that can be adjusted to tune its
performance. Therefore, it is not correct to report results on a test collection which was obtained
by tuning these parameters in order to maximize performance on one particular set of queries
rather than for a random sample of queries. In this respect, in order to tune the parameters of the
system, the correct procedure is to have more than one test collections.
3.2. IR System evaluation measures properties
There are several measures for the system operation for information search. The measures are
based on a collection of documents and queries in which the relevance of the given documents is
known. All the usual measures which are described here assume a binary relevance: the document
is either relevant or non-relevant. In practice, queries can be badly placed and there may be
different shades of relevance.
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
48
Evaluation of the quality of IR system is based on the calculation of Precision, Average Precision
and Recall parameters [4].
Precision parameter represents the ratio of the number of retrieved relevant documents ( retrelN _ )
and the total number of ranked documents ( ._ docrankedN ), presented as result of the IR system.
._
_
Pr
docranked
retrel
N
N
ecision = (1)
Recall represents the ratio of the number of relevant retrieved documents ( retrelN _ ) and the total
number of relevant documents within the document collection ( relN ).
rel
retrel
N
N
call
_
Re = (2)
In the case that we want Average over queries, more complicated joint measure is required as the
Average Precision. Unfortunately, Recall and Precision have been rarely used in recent retrieval
experiments since all retrieved documents to be ranked and checked for computing results of
these indicators requires large computing time.
For a single information need, in [15], Average Precision is defined as the mean of the Precision
scores obtained after each relevant document is retrieved, using zero as the Precision for relevant
documents that are not retrieved.
In order to better understand those metrics and their sensitivity to the relevance judgment, we will
suppose that we need to evaluate IR system by using a test collection which consists of a set of
queries, set of documents and relevance judgments results whether the document from collection
is relevant or not.
For the sake of simplicity, let xk be a variable that represents binary relevance judgments, i.e.
=
relevantnotisdocumentkif
relevantisdocumentkif
x th
th
k
0
1
(3)
In order to calculate Precision, we will observe the Nranked_doc. ranked documents in the set of
documents. Therefore, from (1), Precision after Nranked_doc. ranked documents retrieved can be
computed as:
∑=
=
._
1._
1
Pr
docrankedN
k
k
docranked
N x
N
ecision (4)
On the other side, using the same notations and considering definition of Recall parameter from
(2), for Nranked_doc. top ranked documents, the value of Recall is computed such that
∑=
=
._
1
1
Re
docrankedN
k
k
rel
k x
N
call (5)
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
49
Therefore, using our notations, Average Precision can be expressed as
∑∑∑ ===
==
i
k
k
N
i
i
retrel
N
i
ii
retrel
x
i
x
N
ecisionx
N
iecisionAverage
docrankeddocranked
11_1_
._._
1
Pr
1
)(Pr (6)
From the previous expressions, it is clear that the value of Average Precision depends on the
number of relevant documents retrieved and the position of relevant documents within the ranked
list. Supposing that the number of relevant documents in collection and the number of documents
in the list have constant value and that we have only one relevant document Nrel_ret=1 on the ith
position within the document set ( ._1 docrankedNi ≤≤ ), Average Precision parameter becomes:
i
iecisionAverage
1
)(Pr =
(7)
While Recall parameter becomes:
.
1
Re const
N
call
rel
== (8)
Therefore, the difference in value of Average Precision when the relevant document is located on
two adjacent positions within the ranked list is
)1(
1
)(Pr
+⋅
−=
ii
iecisiondAverage (9)
Those two functions are shown in Figure 2. Figure 2. Illustrates that the value of Average
Precision and )(Pr iecisiondAverage are negligible for 10≥i .
0 5 10 15 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
position of the relevant document
average
precision
0 5 10 15 20
-0.5
-0.45
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
position of the relevant document
dAveragePrecision
Figure 2. Average Precision (on the left one), dAveragePrecision (on the right one) vs. position of the
relevant document: Nrel_ret=1;
In a further discussion, we will try to estimate the regularity of change of the Average Precision
value when we assume that we switch the relevance judgments of the documents from the ranked
list, from non-relevant to relevant.
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
50
There are
switched
docranked
k
k
N
switched
._
2 different possibilities for relevance judgments within the
Nranked_doc. ranked document list, where kswitched presents the number of the switched judgments.
In the first case, we assume that kswitched=1, i.e. only one document within Nranked_doc. ranked
document list is switched. We assume that switch is performed on the jth position, where
._1 docrankedNj ≤≤ . The difference between Average Precision value where jth document is
relevant, and Average Precision value when jth document is not relevant is expressed such that
)0(Pr)1(Pr)(Pr =−==∆ jj xecisionAveragexecisionAveragejecisionAverage (10)
Average Precision value in the case when jth document is not relevant is presented as
+== ∑∑ +=
−
=
._
1
1
1_
PrPr
1
)0(Pr
docrankedN
ji
ii
j
i
ii
retrel
j ecisionxecisionx
N
xecisionAverage (11)
On the other side, Average Precision value for the case where jth document is relevant is
presented such that
++
+
== ∑∑ +=
−
=
._
1
''
1
1_
PrPrPr
)1(
1
)1(Pr
docrankedN
ji
iij
j
i
ii
retrel
j ecisionxecisionecisionx
N
xecisionAverage (12)
Considering that ii xx ='
for ._
'
1 docrankedNii ≤=≤ and 1_ >>retrelN ,
+=
=
∆+=∆
∑∑
∑
+==
+=
._
1
'
1._
._
1
''
_
111
PrPr
1
)(Pr
docNranked
ji
i
j
k
k
retrel
docNranked
ji
iij
retrel
x
i
x
jN
ecisionxecision
N
jecisionAverage
(13)
Where
i
ecisioni
1
Pr '
=∆ presents the change of the Precision on the ith position where xi=1.
When we want to observe the change in Average Precision value when k switches in relevance
judgment, from relevant to non-relevant are done, we can conclude that k simultaneous switches
provoke the same changes in Average Precision value as k successive single switches. Therefore,
the total change in Average Precision value can be presented as a total of single changes. Since
that )1(Pr)(Pr +∆>∆ jecisionAveragejecisionAverage , the value of changes is bigger
when the position of the relevant document within the ranked list is lower.
4. EXPERIMENTS
In this work, like in [14], testing was done by EBART 3. This corpus is selected as a subset of the
EBART corpus 2 GB, Serbian newspaper article collection, the largest digital media corpus in
Serbia. It consists only of articles from the Serbian daily newspaper “Politika”, published from
2003. to 2006. There are 3366 articles in this collection. For the purpose of the first experiment,
we used on small sample that were artificially generated. While, for the purposes of the second
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
51
and third experiment, we use the queries collection that contains from 2KB to 6KB randomly
selected newspaper articles. In order to make relevance judgment, we use "hands-on" experience
in the process of evaluating information retrieval systems. Those queries we run on Google
search engine. When judging pages for relevance, it should be considered that the page is relevant
if it is on the appropriate topic and it is not relevant only because it has a link to a page that has
relevant content. If it is not possible to fulfil those requirements, we mark the document as non-
relevant. The list of 1's and 0's, presented in the Table 1 and Table 2, represents relevant (1) and
non-relevant (0) documents in a ranked list of 20 documents in a response to a query. It is
assumed that are 10 relevant documents in the total collection. Observing all three parameters for
measuring the quality of the IR system, Average Precision, Recall and Precision, with the aim to
introduce IR system performance factors sensitivity to Google ranking, in this paper, we describe
three experiments.
The first experiment is artificial in order to show the IR performance dependence on the position
of the relevant documents within Google ranked list. We assume that there are 20 queries and for
each query document, Google's results present only one relevant document, starting from the 1st
one through 20th position. This experiment is already mathematically proved in the previous
section where it is explained that parameters for IR system evaluation are very sensitive to
relevance judgment which is not usually very reliable. It is shown that the variation of Average
Precision parameter value is highly present in the positions 1 to 10, while from the 10th position
on, this variation is negligible. The results are presented in Figure 3. and it is shown that as
position of the relevant documents is shifting downwards, the value of Average Precision is
decreasing. Since all cases have the same scenario and the number of relevant documents within
the ranked list is equal and have a value of 1, it is expected that the values of parameters for
Precision and Recall are constant, while values for parameter Average Precision decline. Based
on the chart, shown in Figure 3, it can be seen that the Average Precision has a more rapid decline
from 1st to 10th position, while from the 10th position, this difference is negligible. The
difference in values for Average Precision parameter between two successive positions is much
higher at the top than at the bottom. For example, the difference in value between the 1st and the
2nd position is 0.95, while between 10th and 11th position is 0.001.
Figure 3. Dependence of the IR system performance parameters of the position of the relevant documents
in Google ranked list.
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
52
Since in the previous experiment we saw that the system performance in terms of Average
Precision value is better when the position of the relevant documents within the ranked list is at
the beginning, the objective of the second experiment is to show that the value of the Average
Precision parameter is approximately equal in those two cases: 1) only 1 relevant document in the
first position of the Google ranked list, and 2) more relevant documents but at the end of this list.
It is interesting to show how many relevant documents at the lowest positions are required, in
order to Average Precision approximately equalize with the value of the Average Precision when
the position of single relevant document is first in Google's results. First row of Table 1 shows
that the first document within the ranked list is marked as relevant. The second row of Table 1
shows that the ranked list with the same number of relevant documents, with the difference that
the position of the relevant document is on the last position. The values of the parameters show
that the Average Precision is 20 times better in the case where the relevant document is in the first
position, while the values for Recall and Precision are identical as in the case when the relevant
document is in 1st position. The third row of the second table shows the relevance judgment,
where the two last documents are relevant. Average Precision is 6.67 times worse than in the first
case, where the only relevant document was on the top of the list, while the values of the
parameters of Precision and Recall were twice as high. In each subsequent row, all values of the
parameters, Average Precision, Recall and Precision are linearly increasing with increase in the
number of relevant documents, while the Average Precision reaches the value of the case when
the relevant document is in the first position in the ranked list only after we have 6 relevant
documents at the end of the list. The chart shown in Figure 4 shows the changes in the values for
each parameter.
The objective of the third experiment is to show that parameters for IR system evaluation are very
sensitive to relevance judgment which can be dramatically changed during machine translation.
This experiment consists of an analysis of the same query document for two different cases. In
the first case, the query document is observed in original in Serbian language. In the second case,
a given query document is observed as in the first case, with the difference that the query
document is translated with the help of human translation first into English and then with the help
of machine translation, re-translated into Serbian. Respecting the rules described above and the
scenario for relevance judgment, in the first case, the result obtained shows that the relevant
document in the ranked list of 20 documents is located on the 1st, 2nd, 7th, 9th and 10th position,
while in the case when the query document is translated, the relevant document is on the 1st and
15th position within the Google ranked list. For obtained values of position, parameter values for
the Average Precision, Precision and Recall are shown in Table 2. From the results obtained, it is
evident that under the same scenario, the relevance judgment is different in the sense that in the
case of translation, there are less relevant documents, and Google ranking is lower. Because of
these facts, it is expected that the overall performance of IR system is worse.
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
53
Table 1. First position vs. last positions of the relevant documents
Query Relevance
judgments
Avg.
Precision
Precision Recall
1 10000
00000
00000
00000
0.1 0.05 0.10
2 00000
00000
00000
00001
0.005 0.05 0.10
3 00000
00000
00000
00011
0.015 0.10 0.20
4 00000
00000
00000
00111
0.031 0.15 0.30
5 00000
00000
00000
01111
0.053 0.20 0.40
6 00000
00000
00000
11111
0.081 0.25 0.50
7 00000
00000
00001
11111
0.12 0.30 0.60
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
54
Figure 4. A comparison between the IR system performance of the parameters, in the case where the
relevant document is at the beginning, and more relevant documents are at the end of the ranked list
Table 2. Experimental Results comparison for the queries for the top 20 ranked documents with and
without translation
Translated Relevance
judgments
Avg.
Precision
Precision Recall
No 11000
01011
00000
00000
0.34 0.25 0.50
Yes 10000
00000
00001
00000
0.11 0.10 0.20
0
0,1
0,2
0,3
0,4
0,5
not translated translated
Avg. precision Precision Recall
Figure 5. A comparison between the IR system performance of the parameters, in the case if the query is
translated or not
11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
55
4.1. Google Ranking
According to statistics, more than one billion people use Google search, generating about 115
billion monthly unique searches. In addition, Google occupies 67.6% market share for browsers,
which means that marketers who want a high position in search results have to try hard to fulfil as
much as possible Google requires. No matter how a particular document is relevant to the user
query, they will be ranked according to their importance. Website is significant, for example, if
there are many web pages that have a link to it.
For a typical query, there is a huge number of websites with the requested information. Today,
Google Docs rely on more than 200 unique signals that allow us to guess what we are looking for
[16].
5. CONCLUSIONS
In this paper, we observe three main parameters to measure the quality of the IR system (Average
Precision, Recall and Precision). We show some interesting findings of the properties of those
measures, using Mathematical and empirical analysis;
Mathematically, it is demonstrated that all those parameters are very sensitive to relevance
judgment which is not usually very reliable. We show that the relevant document position shifting
downwards within the ranked list is followed by Average Precision decreasing. The variation of
Average Precision value is highly present in the positions 1 to 10, while from the 10th position
on, this variation is negligible. Additionally, we try to estimate the regularity of the Average
Precision value change, when we assume that we are switching the arbitrary number of relevance
judgments, from non-relevant to relevant. We demonstrates in the case of observing the change in
Average Precision value when k switches in relevance judgment, from relevant to non-relevant
are done, we can conclude that k simultaneous switches cause the same changes in Average
Precision value as k successive single switches. Therefore, the total change in Average Precision
value can be presented as a total of single changes and change in Average Precision has bigger
value when the relevance judgment switch is performed on the higher ranking positions.
Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have
approximately same Average Precision value as a single relevant document at the beginning of
this list, while Recall and Precision values increase linearly, regardless of the document position
within the list. This way, influence of detecting unknown relevant document and its position
within the ranked list on the score is discussed.
Also, we show that in case of Serbian-to-English human translation query followed by English-
to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the
parameters for measuring the IR system performance are also subject to change.
ACKNOWLEDGEMENTS
I would like to express my special thanks of gratitude to the Dean of Singidunum University
(Professor Dr. Milovan Stanišić) who gave me the opportunity to do this project and to all my
colleagues at the Singidunum University continually gave me useful advices and moral support.
12. International Journal of Computer Science & Information Technology (IJCSIT) Vol 7, No 4, August 2015
56
REFERENCES
[1] Languages used on internet. [Available at
http://en.wikipedia.org/wiki/Languages_used_on_the_Internet]
[2] Marjanović, M., Tomašević, V. & Živković, D. (2015) “Anti-plagiarism software: usage,
effectiveness and issues”, Proceedings of International Scientific Conference of IT and Business
Related Research, SINTEZA 2015, Ed., pp. 119-122.
[3] Clough, P. & Sanderson, M. (2013) “Evaluating the performance of information retrieval systems
using test collections” Information Research, 18(2) paper 582. [Available at
http://InformationR.net/ir/18-2/paper582.html]
[4] Saracevic, T. (1995) “Evaluation of evaluation in information retrieval”, In Edward A. Fox, Peter
Ingwersen, Raya Fidel (Eds.). Proceedings of 18th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Seattle, Washington, USA.
[5] Robertson, S.E. & Hancock-Beaulieu, M. (1992) “On the evaluation of information retrieval systems
information processing and management”, Ed., pp. 457-466.
[6] Harman, D. (1995) “Overview of the third text retrieval conference (TREC-3)”, Proceedings of the
Third Text Retrieval Conference, Ed., pp. 1-20.
[7] Harman, D. (1995) “Overview of the fourth text retrieval conference (TREC-4)”, Proceedings of the
Third Text Retrieval Conference, Ed., pp. 1-20.
[8] Douglas, O. & Hackaett, P., (1997) “In proceedings of the sixth retrieval conference (TREC-6)”,
Gaithersburg, MD: National Institute of Standards and Technology, Ed., pp. 687-696.
[9] Pereira, R. C. (2010) “Cross-languages plagiarism detection”, Universidade Federal do Rio Grande do
Sul, Instituto de Informatica, PhD Dissertation. [Available at
https://www.lume.ufrgs.br/bitstream/handle/10183/27652/000763631.pdf?sequence=1.]
[10] Milošević, N., “Stemmer for Serbian language”, Natural language processing. [Available at
http://arxiv.org/ftp/arxiv/papers/1209/1209.4471.pdf.]
[11] Stemmer for Serbian language. [Available at http://www.inspiratron.org/SerbianStemmer.php]
[12] Delić, V., Sečujski, M. & Kupusinac A., “Transformation-based part-of-speech tagging for Serbian
language”, Recent Advances in Computational Intelligence, Man-Machine Systems and Cybernetics.
[Available at http://www.wseas.us/e-library/conferences/2009/tenerife/CIMMACS/CIMMACS-
15.pdf]
[13] Character Encoding Recommendation for Languages. [Available at
http://scratchpad.wikia.com/wiki/Character_Encoding_Recommendation_for_Languages.]
[14] Graovac, J. (2014) “Text categorization using n-gram based language independent technique”,
Natural Language Processing for Serbian - Resources and Applications, Proceedings of the
Conference 35th Anniversary of Computational Linguistics in Serbia, ISBN: 978-86-7589-088- 1,
Ed., pp. 124–135.
[15] Buckley, C. & Vorhees, E.M. (2000) “Evaluating evaluation measure stability”, In Proceedings of the
23rd
Annual International ACM SIGIR Conference on Research and Development in Information
Retrieval, Ed., pp. 33-40.
[16] Dean, B. (2015) “Google’s 200 Ranking Factors: The Complete List”. [Available at from
http://backlinko.com/google-ranking-factors.
Author
Marina Marjanović-Jakovljević received the Electrical Engineer degree from
Belgrade University, Serbia, in 2002. In 2007. She received Ph.D. degree in
Telecommunications at the Signal Processing Group of the Polytechnic
University de Madrid (UPM). She has been awarded a Telefonica Moviles
Fellowship to the best academic trajectory. She is currently working as an
Associate Professor in the department of the Computer Engineering at the
Singidunum University in Belgrade. Her research interests include Information
Retreival Systems, UWB systems, ad hoc networks, and wireless
communications.