This document proposes a general active learning framework called expected loss optimization (ELO) for ranking problems. The ELO framework uses an ensemble of ranking models to select examples that are expected to minimize a chosen loss function, such as discounted cumulative gain (DCG) loss. The document presents an algorithm called expected DCG loss optimization (ELO-DCG) that selects examples based on expected DCG loss. It also describes a two-stage algorithm that first selects queries and then documents to address sample dependence. Finally, it discusses how the algorithms can be modified to handle skewed grade distributions in ranking data.
Slides presenting preliminary overview of thesis work presented at the International Conference on Electronic Learning in the Workplace at Columbia University on June 11, 2010.
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
Slides presenting preliminary overview of thesis work presented at the International Conference on Electronic Learning in the Workplace at Columbia University on June 11, 2010.
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
In this case study we identify the factors that influence the adoption of a new system in a major company in Saudi Arabia. We develop a theoretical framework to help derive better understanding of system adoption via socio-technical integration.
We formulation of 14 hypotheses that were tested via a survey of 42 system users. Management support and change management were found to be significant factors influencing system adoption. As a result, the 14 null hypotheses were rejected due to their statistical significance (p-value < 0.05). Discussions and recommendations for future research are discussed.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
In this paper, we present a genetic programming (GP) approach to record
deduplication with indexing techniques.Data de-duplication is a process in which data are
cleaned from duplicate records due to misspelling, field swap or any other mistake or data
inconsistency. This process requires that we identify objects that are included in more than
one list.The problem of detecting and eliminating duplicated data is one of the major
problems in the broad area of data cleaning and data quality in data warehouse. So, we
need to create such a algorithm that can detect and eliminate maximum duplications.GP
with indexing is one of the optimization technique that helps to find maximum duplicates in
the database. We used adeduplication function that is able to identify whether two or more
entries in a repository are replicas or not. As many industries and systems depend on the
accuracy and reliability of databases to carry out operations. Therefore, the quality of the
information stored in the databases, can have significant cost implications to a system that
relies on information to function and conduct business. Moreover, this is fact that clean and
replica-free repositories not only allow the retrieval of higher quality information but also
lead to more concise data and to potential savings in computational time and resources to
process this data.
Index
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
In this case study we identify the factors that influence the adoption of a new system in a major company in Saudi Arabia. We develop a theoretical framework to help derive better understanding of system adoption via socio-technical integration.
We formulation of 14 hypotheses that were tested via a survey of 42 system users. Management support and change management were found to be significant factors influencing system adoption. As a result, the 14 null hypotheses were rejected due to their statistical significance (p-value < 0.05). Discussions and recommendations for future research are discussed.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
In this paper, we present a genetic programming (GP) approach to record
deduplication with indexing techniques.Data de-duplication is a process in which data are
cleaned from duplicate records due to misspelling, field swap or any other mistake or data
inconsistency. This process requires that we identify objects that are included in more than
one list.The problem of detecting and eliminating duplicated data is one of the major
problems in the broad area of data cleaning and data quality in data warehouse. So, we
need to create such a algorithm that can detect and eliminate maximum duplications.GP
with indexing is one of the optimization technique that helps to find maximum duplicates in
the database. We used adeduplication function that is able to identify whether two or more
entries in a repository are replicas or not. As many industries and systems depend on the
accuracy and reliability of databases to carry out operations. Therefore, the quality of the
information stored in the databases, can have significant cost implications to a system that
relies on information to function and conduct business. Moreover, this is fact that clean and
replica-free repositories not only allow the retrieval of higher quality information but also
lead to more concise data and to potential savings in computational time and resources to
process this data.
Index
In a world of data explosion, the rate of data generation and consumption is on the increasing side,
there comes the buzzword - Big Data.
Big Data is the concept of fast-moving, large-volume data in varying dimensions (sources) and
highly unpredicted sources.
The 4Vs of Big Data
● Volume - Scale of Data
● Velocity - Analysis of Streaming Data
● Variety - Different forms of Data
● Veracity - Uncertainty of Data
With increasing data availability, the new trend in the industry demands not just data collection but making an ample sense of acquired data - thereby, the concept of Data Analytics.
Taking it a step further to further make futuristic prediction and realistic inferences - the concept
of Machine Learning.
A blend of both gives a robust analysis of data for the past, now and the future.
There is a thin line between data analytics and Machine learning which becomes very obvious
when you dig deep.
In a world of data explosion, the rate of data generation and consumption is on the increasing side, there comes the buzzword - Big Data.
Big Data is the concept of fast-moving, large-volume data in varying dimensions (sources) and
highly unpredicted sources.
The 4Vs of Big Data
● Volume - Scale of Data
● Velocity - Analysis of Streaming Data
● Variety - Different forms of Data
● Veracity - Uncertainty of Data
With increasing data availability, the new trend in the industry demands not just data collection,
but making ample sense of acquired data - thereby, the concept of Data Analytics.
Taking it a step further to further make a futuristic prediction and realistic inferences - the concept
of Machine Learning.
A blend of both gives a robust analysis of data for the past, now and the future.
There is a thin line between data analytics and Machine learning which becomes very obvious
when you dig deep.
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTIJCSEA Journal
Semi-supervised considering so as to cluster expects to enhance clustering execution client supervision as
pair wise imperatives. In this paper, we contemplate the dynamic learning issue of selecting pair wise
must-connect and can't interface imperatives for semi supervised clustering. We consider dynamic learning
in an iterative way where in every emphasis questions are chosen in light of the current clustering
arrangement and the current requirement set. We apply a general system that expands on the idea of
Neighbourhood, where Neighbourhoods contain "named samples" of distinctive bunches as indicated by
the pair wise imperatives. Our dynamic learning strategy extends the areas by selecting educational
focuses and questioning their association with the areas. Under this system, we expand on the fantastic
vulnerability based rule and present a novel methodology for figuring the instability related with every
information point. We further present a determination foundation that exchanges off the measure of
vulnerability of every information point with the expected number of inquiries (the expense) needed to
determine this instability. This permits us to choose questions that have the most astounding data rate. We
assess the proposed strategy on the benchmark information sets and the outcomes show predictable and
significant upgrades over the current cutting edge.
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTIJCSEA Journal
Semi-supervised considering so as to cluster expects to enhance clustering execution client supervision as
pair wise imperatives. In this paper, we contemplate the dynamic learning issue of selecting pair wise
must-connect and can't interface imperatives for semi supervised clustering. We consider dynamic learning
in an iterative way where in every emphasis questions are chosen in light of the current clustering
arrangement and the current requirement set. We apply a general system that expands on the idea of
Neighbourhood, where Neighbourhoods contain "named samples" of distinctive bunches as indicated by
the pair wise imperatives. Our dynamic learning strategy extends the areas by selecting educational
focuses and questioning their association with the areas. Under this system, we expand on the fantastic
vulnerability based rule and present a novel methodology for figuring the instability related with every
information point. We further present a determination foundation that exchanges off the measure of
vulnerability of every information point with the expected number of inquiries (the expense) needed to
determine this instability. This permits us to choose questions that have the most astounding data rate. We
assess the proposed strategy on the benchmark information sets and the outcomes show predictable and
significant upgrades over the current cutting edge.
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTIJCSEA Journal
Semi-supervised considering so as to cluster expects to enhance clustering execution client supervision as
pair wise imperatives. In this paper, we contemplate the dynamic learning issue of selecting pair wise
must-connect and can't interface imperatives for semi supervised clustering. We consider dynamic learning
in an iterative way where in every emphasis questions are chosen in light of the current clustering
arrangement and the current requirement set. We apply a general system that expands on the idea of
Neighbourhood, where Neighbourhoods contain "named samples" of distinctive bunches as indicated by
the pair wise imperatives. Our dynamic learning strategy extends the areas by selecting educational
focuses and questioning their association with the areas. Under this system, we expand on the fantastic
vulnerability based rule and present a novel methodology for figuring the instability related with every
information point. We further present a determination foundation that exchanges off the measure of
vulnerability of every information point with the expected number of inquiries (the expense) needed to
determine this instability. This permits us to choose questions that have the most astounding data rate. We
assess the proposed strategy on the benchmark information sets and the outcomes show predictable and
significant upgrades over the current cutting edge.
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTIJCSEA Journal
Semi-supervised considering so as to cluster expects to enhance clustering execution client supervision as
pair wise imperatives. In this paper, we contemplate the dynamic learning issue of selecting pair wise
must-connect and can't interface imperatives for semi supervised clustering. We consider dynamic learning
in an iterative way where in every emphasis questions are chosen in light of the current clustering
arrangement and the current requirement set. We apply a general system that expands on the idea of
Neighbourhood, where Neighbourhoods contain "named samples" of distinctive bunches as indicated by
the pair wise imperatives. Our dynamic learning strategy extends the areas by selecting educational
focuses and questioning their association with the areas. Under this system, we expand on the fantastic
vulnerability based rule and present a novel methodology for figuring the instability related with every
information point. We further present a determination foundation that exchanges off the measure of
vulnerability of every information point with the expected number of inquiries (the expense) needed to
determine this instability. This permits us to choose questions that have the most astounding data rate. We
assess the proposed strategy on the benchmark information sets and the outcomes show predictable and
significant upgrades over the current cutting edge.
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTIJCSEA Journal
Semi-supervised considering so as to cluster expects to enhance clustering execution client supervision as
pair wise imperatives. In this paper, we contemplate the dynamic learning issue of selecting pair wise
must-connect and can't interface imperatives for semi supervised clustering. We consider dynamic learning
in an iterative way where in every emphasis questions are chosen in light of the current clustering
arrangement and the current requirement set. We apply a general system that expands on the idea of
Neighbourhood, where Neighbourhoods contain "named samples" of distinctive bunches as indicated by
the pair wise imperatives. Our dynamic learning strategy extends the areas by selecting educational
focuses and questioning their association with the areas. Under this system, we expand on the fantastic
vulnerability based rule and present a novel methodology for figuring the instability related with every
information point. We further present a determination foundation that exchanges off the measure of
vulnerability of every information point with the expected number of inquiries (the expense) needed to
determine this instability. This permits us to choose questions that have the most astounding data rate. We
assess the proposed strategy on the benchmark information sets and the outcomes show predictable and
significant upgrades over the current cutting edge.
Document contains some of the questions from the Domingos Paper. Overall idea is to understand what Machine Learning is all about. This paper helps us to understand the need of Machine Learning in our day to day lives. Well I you will find this document helpful.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
According to the increase of using data mining techniques in improving educational systems
operations, Educational Data Mining has been introduced as a new and fast growing research
area. Educational Data Mining aims to analyze data in educational environments in order to
solve educational research problems. In this paper a new associative classification technique
has been proposed to predict students final performance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along
with high accuracy. In this research work, we have employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract association rule for student performance prediction
as a multi-objective classification problem. Results indicate that the proposed swarm based
algorithm outperforms well-known classification techniques on student performance prediction
classification problem.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
Similar to Active learning for ranking through expected loss optimization (20)
Retiming of digital circuits is conventionally based on the estimates of propagation delays across different paths in the data-flow graphs (DFGs) obtained by discrete component
timing model, which implicitly assumes that operation of a node can begin only after the completion of the operation(s) of its preceding node(s) to obey the data dependence requirement. Such a discrete component timing model very often gives much higher
estimates of the propagation delays than the actuals particularly when the computations in the DFG nodes correspond to fixed point arithmetic operations like additions and multiplications
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Active learning for ranking through expected loss optimization
1. Active Learning for Ranking through
Expected Loss Optimization
Bo Long, Jiang Bian, Olivier Chapelle, Ya Zhang, Yoshiyuki Inagaki, and Yi Chang
Abstract—Learning to rank arises in many data mining applications, ranging from web search engine, online advertising to
recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled
examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time-consuming.
This presents a great need for the active learning approaches to select most informative examples for ranking learning; however, in
the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning
framework, expected loss optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions.
Under this framework, we derive a novel algorithm, expected discounted cumulative gain (DCG) loss optimization (ELO-DCG), to
select most informative examples. Then, we investigate both query and document level active learning for raking and propose a
two-stage ELO-DCG algorithm which incorporate both query and document selection into active learning. Furthermore, we show
that it is flexible for the algorithm to deal with the skewed grade distribution problem with the modification of the loss function.
Extensive experiments on real-world web search data sets have demonstrated great potential and effectiveness of the proposed
framework and algorithms.
Index Terms—Active learning, ranking, expected loss optimization
Ç
1 INTRODUCTION
RANKING is the core component of many important infor-
mation retrieval problems, such as web search, recom-
mendation, computational advertising. Learning to rank
represents an important class of supervised machine learn-
ing tasks with the goal of automatically constructing rank-
ing functions from training data. As many other supervised
machine learning problems, the quality of a ranking func-
tion is highly correlated with the amount of labeled data
used to train the function. Due to the complexity of many
ranking problems, a large amount of labeled training exam-
ples is usually required to learn a high quality ranking func-
tion. However, in most applications, while it is easy to
collect unlabeled samples, it is very expensive and time-
consuming to label the samples.
Active learning comes as a paradigm to reduce the label-
ing effort in supervised learning. It has been mostly studied
in the context of classification tasks [20]. Existing algorithms
for learning to rank may be categorized into three groups:
pointwise approach [8], pairwise approach [26], and listwise
approach [22]. Compared to active learning for classifica-
tion, active learning for ranking faces some unique chal-
lenges. First, there is no notion of classification margin in
ranking. Hence, many of the margin-based active learning
algorithms proposed for classification tasks are not readily
applicable to ranking. Further more, even some straightfor-
ward active learning approach, such as query-by-committee
(QBC), has not been justified for the ranking tasks under
regression framework. Second, in most supervised learning
setting, each data sample can be treated completely inde-
pendent of each other. In learning to rank, data examples
are not independent, though they are conditionally inde-
pendent given a query. We need to consider this data
dependence in selecting data and tailor active learning algo-
rithms according to the underlying learning to rank
schemes. Third, ranking problems are often associated with
very skewed data distributions. For example, in the case of
document retrieval, the number of irrelevant documents is
of orders of magnitude more than that of relevant docu-
ments in training data. It is desirable to consider the data
skewness when selecting data for ranking. Therefore, there
is a great need for an active learning framework for ranking.
In this paper, we attempt to address those three impor-
tant and challenging aspects of active learning for ranking.
We first propose a general active learning framework,
expected loss optimization (ELO), and apply it to ranking.
The key idea of the proposed framework is that given a loss
function, the samples minimizing the expected loss (EL) are
the most informative ones. Under this framework, we
derive a novel active learning algorithm for ranking, which
uses function ensemble to select most informative examples
that minimizes a chosen loss. For the rest of the paper, we
use web search ranking problem as an example to illustrate
B. Long is with LinkedIn Inc., Mountain View, CA 94043.
E-mail: blong@linkedin.com.
J. Bian is with Microsoft Research, Beijing, China 100080.
E-mail: jibian@microsoft.com.
O. Chapelle is with Criteo, Palo Alto, CA 94301.
E-mail: olivier@chapelle.cc.
Y. Zhang is with Shanghai Jiao Tong University, Shanghai, China
200240. E-mail: ya_zhang@sjtu.edu.cn.
Y. Inagaki and Y. Chang are with Yahoo! Labs, Sunnyvale, CA 94089.
E-mail: {inagakiy, yichang}@yahoo-inc.com.
Manuscript received 3 May 2013; revised 22 Dec. 2013; accepted 13 Sept.
2014. Date of publication 29 Oct. 2014; date of current version 27 Mar. 2015.
Recommended for acceptance by I. Davidson.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TKDE.2014.2365785
1180 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
1041-4347 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
2. the ideas and perform evaluation. But the proposed
method is generic and may be applicable to all ranking
applications. In the case of web search ranking, we mini-
mize the expected discounted cumulative gain (DCG) loss,
one of the most commonly used loss for web search rank-
ing. This algorithm may be easily adapted to other ranking
loss such as normalized discounted cumulative gain
(NDCG) or average precision. To address the data depen-
dency issue, the proposed algorithm is further extended to
a two-stage active learning schema to seamlessly integrate
query level and document level data selection. Finally, we
extended the proposed algorithms to address the skew
grade distribution problem.
The main contributions of the paper are summarized as
follows.
We propose a general active learning framework
based on expected loss optimization. This frame-
work is applicable to various ranking scenarios with
a wide spectrum of learners and loss functions. We
also provides a theoretically justified measure of the
informativeness.
Under the ELO framework, we derive novel algo-
rithms to select most informative examples by opti-
mizing the expected DCG loss. Those selected
examples represent the ones that the current ranking
model is most uncertain about and they may lead to
a large DCG loss if predicted incorrectly.
We propose a two stage active learning algorithm for
ranking, which addresses the sample dependence
issue by first performing query level selection and
then document level selection.
We further show how to extend ELO framework to
address the skewed grade distribution problem in
ranking. Balanced version ELO algorithms are
derived for both query level active learning and two-
stage active learning.
2 RELATED WORK
The main motivation for active learning is that it usually
requires time and/or money for the human expert to label
examples and those resources should not be wasted to label
non-informative samples, but be spent on interesting ones.
Optimal experimental design [12] is closely related to
active learning as it attempts to find a set of points such that
the variance of the estimate is minimized. In contrast to this
“batch” formulation, the term active learning often refers to
an incremental strategy [7].
There has been various types of strategies for active
learning that we now review. A comprehensive survey can
be found in [21]. The simplest and maybe most common
strategy is uncertainty sampling [18], where the active learn-
ing algorithm queries points for which the label uncertainty
is the highest. The drawback of this type of approach is that
it often mixes two types of uncertainties, the one stemming
from the noise and the variance. The noise is something
intrinsic to the learning problem which does not depend on
the size of the training set. An active learner should not
spend too much effort in querying points in noisy regions of
the input space. On the other hand, the variance is the
uncertainty in the model parameters resulting from the
finiteness of the training set. Active learning should thus try
to minimize this variance and this was first proposed in [7].
In Bayesian terms, the variance is computed by integrat-
ing over the posterior distribution of the model parameters.
But in practice, it may be difficult or impossible to compute
this posterior distribution. Instead, one can randomly sam-
ple models from the posterior distribution [9]. An heuristic
way of doing so is to use a bagging type of algorithm [1].
This type of analysis can be seen as an extension of the
Query-By-Committee algorithm [14] which has been derived
in a noise free classification setting. In that case, the poste-
rior distribution is uniform over the version space—the space
of consistent hypothesis with the labeled data—and the
QBC algorithm selects points on which random functions in
the version space have the highest disagreement.
Another fairly common heuristic for active learning is to
select points that once added in the training set are expected
to result in a large model change [21] or a large increase in
the objective function value that is being optimized [4].
Another related filed is ranking. In recent years, the rank-
ing problem is frequently formulated as a supervised
machine learning problem [28], [29], [30], [31], [32], [33], [34].
These learning-to-rank approaches are capable of combining
different kinds of features to train ranking functions. The
problem of ranking can be formulated as that of learning a
ranking function from pair-wise preference data. The idea is
to minimize the number of contradicting pairs in training
data. For example, RankSVM [28] uses support vector
machines to learn a ranking function from preference data.
RankNet [29] applies neural network and gradient descent
to obtain a ranking function. RankBoost [30] applies the idea
of boosting to construct an efficient ranking function from a
set of weak ranking functions. The studies reported in [33]
proposed a framework called GBRank using gradient
descent in function spaces, which is able to learn relative
ranking information in the context of web search.
For the main focus of this paper, active learning for rank-
ing, compared with traditional active learning, there is still
limited work in the literature. Donmez and Carbonell stud-
ied the problem of document selection in ranking [11]. Their
algorithm selects the documents which, once added to the
training set, are the most likely to result in a large change in
the model parameters of the ranking function. They apply
their algorithm to RankSVM [17] and RankBoost [13]. Also
in the context of RankSVM, [25] suggests to add the most
ambiguous pairs of documents to the training set, that is
documents whose predicted relevance scores are very close
under the current model [35], [36]. Other works based on
pairwise ranking include [6], [10]. In case of binary rele-
vance, [5] proposed a greedy algorithm which selects docu-
ment that are the most likely to differentiate two ranking
systems in terms of average precision. Moreover, an empiri-
cal comparison of document selection strategies for learning
to rank can be found in [2]. Finally, Long et al. [19] proposed
a general active learning framework for ranking. But, it still
suffered from the problem of imbalanced grade in ranking.
There are some related works about query sampling.
Yilmaz and Robertson [24] empirically show that having
more queries but less number of documents per query is
better than having more documents and less queries. Yang
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1181
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
3. et al. propose a greedy query selection algorithm that tries
to maximize a linear combination of query difficulty, query
density and query diversity [23].
3 EXPECTED LOSS OPTIMIZATION
FOR ACTIVE LEARNING
As explained in the previous section, a natural strategy for
active learning is based on variance minimization. The vari-
ance, in the context of regression, stems from the uncer-
tainty in the prediction due to the finiteness of the training
set. Cohn et al. [7] proposes to select the next instance to be
labeled as the one with the highest variance. However, this
approach applies only to regression and we aim at general-
izing it through the Bayesian expected loss [3].
In the rest of the section, we first review Bayesian deci-
sion theory in Section 3.1 and then introduce the expected
loss optimization principle for active learning. In Section 3.2
we show that in the cases of classification and regression,
applying ELO turns out to be equivalent to standard active
learning method. Finally, we present ELO for ranking in
Section 3.3.
3.1 Bayesian Decision Theory
We consider a classical Bayesian framework to learn a func-
tion f : X ! Y parametrized by a vector u. The training data
D is made of n examples, ðx1; y1Þ; ; ðxn; ynÞ. Bayesian learn-
ing consists in:
1) Specifying a prior PðuÞ on the parameters and a like-
lihood function Pðyjx; uÞ.
2) Computing the likelihood of the training data,
PðDjuÞ ¼
Qn
i¼1 Pðyijxi; uÞ.
3) Applying Bayes rule to get the posterior distribution
of the model parameters, PðujDÞ ¼ PðDjuÞPðuÞ=
PðDÞ.
4) For a test point x, computing the predictive distribu-
tion Pðyjx; DÞ ¼
R
u Pðyjx; uÞPðujDÞdu.
Note that in such a Bayesian formalism, the prediction
is a distribution instead of an element of the output space
Y. In order to know which action to perform (or which
element to predict), Bayesian decision theory needs a loss
function. Let ‘ða; yÞ be the loss incurred by performing
action a when the true output is y. Then the Bayesian
expected loss is defined as the expected loss under the
predictive distribution:
rðaÞ :¼
Z
y
‘ða; yÞPðyjx; DÞdy: (1)
The best action according to Bayesian decision theory is
the one that minimizes that loss: aÃ
:¼ arg minarðaÞ. Central
to our analysis is the expected loss of that action, rðaÃ
Þ or
ELðxÞ :¼ min
a
Z
y
‘ða; yÞPðyjx; DÞdy: (2)
This quantity should be understood as follows: given that
we have taken the best action aÃ
for the input x, and that the
true output is in fact given by Pðyjx; DÞ, what is, in expecta-
tion, the loss to be incurred once the true output is revealed?
The overall generalization error (i.e. the expected error
on unseen examples) is the average of the expected loss
over the input distribution:
R
x ELðxÞPðxÞdx. Thus, in order
to minimize this generalization error, our active learning
strategy consists in selecting the input instance x to maxi-
mize the expected loss:
arg max
x
ELðxÞ:
3.2 ELO for Regression and Classification
In this section, we show that the ELO principle for active
learning is equivalent to well known active learning strat-
egies for classification and regression. In the cases of
regression and classification, the ”action” a discussed
above is simply the prediction of an element in the output
space Y.
3.2.1 Regression
The output space for regression is Y ¼ R and the loss func-
tion is the squared loss ‘ða; yÞ ¼ ða À yÞ2
. It is well known
that the prediction minimizing this square loss is the mean
of the distribution and that the expected loss is the variance:
arg min
a
Z
y
ða À yÞ2
Pðy j x; DÞdy ¼ m
and min
a
Z
y
ða À yÞ2
Pðy j x; DÞdy ¼ s2
;
where m and s2
are the mean and variance of the predictive
distribution. So in the regression case, ELO will choose the
point with the highest predictive variance which is exactly
one of the classical strategy for active learning [7].
3.2.2 Classification
The output space for binary classification is Y ¼ fÀ1; 1g and
the loss is the 0/1 loss: ‘ða; yÞ ¼ 0 if a ¼ y, 1 otherwise. The
optimal prediction is given according to arg maxa2YPðy ¼
ajx; DÞ and the expected loss turns out to be:
minðPðy ¼ 1jx; DÞ; Pðy ¼ À1jx; DÞÞ;
which is maximum when Pðy ¼ 1jx; DÞ ¼ Pðy ¼ À1jx; DÞ ¼
0:5, that is when we are completely uncertain about the class
label. This uncertainty based active learning is the most
popular one for classification which was first proposed in
[18].
3.3 ELO for Ranking
In the case of ranking, the input instance is a query and a set
of documents associated with it, while the output is a vector
of relevance scores. If the query q has n documents, let us
denote by Xq :¼ ðx1; . . . ; xnÞ the feature vectors describing
these (query,document) pairs and by Y :¼ ðy1; . . . ; ynÞ their
labels. As before we have a predictive distribution PðY j
Xq; DÞ. Unlike active learning for classification and regres-
sion, active learning for ranking can select examples at dif-
ferent levels. One is query level, which selects a query with
all associated documents Xq; the other one is document
level, which selects documents xi individually.
1182 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
4. 3.3.1 Query Level
In the case of ranking, the ”action” in ELO framework is
slightly different than before because we are not directly
interested in predicting the scores, but instead we want to
produce a ranking. So the set of actions is the set of permu-
tations of length n and for a given permutation p, the rank
of the ith document pðiÞ. The expected loss for a given p can
thus be written as:
Z
Y
‘ðp; Y ÞPðY jXq; DÞdY; (3)
where ‘ðp; Y Þ quantifies the loss in ranking according to p if
the true labels are given by Y . The next section will detail the
computation of the expected loss where ‘ is the DCG loss.
As before, the ELO principle for active learning tells us to
select the queries with the highest expected losses:
ELðqÞ :¼ min
p
Z
Y
‘ðp; Y ÞPðY jXq; DÞdY: (4)
As an aside, note that the ranking minimizing the loss (3)
is not necessarily the one obtained by sorting the documents
according to their mean predicted scores. This has already
been noted for instance in [27].
3.3.2 Document Level
Selecting the most informative document is a bit more com-
plex because the loss function in ranking is defined at the
query level and not at the document level. We can still use
the expected loss (4), but only consider the predictive distri-
bution for the document of interest and consider the scores
for the other documents fixed. Then we take an expectation
over the scores of the other documents. This leads to:
ELðq; iÞ ¼
Z
Y i
min
p
Z
yi
‘ðp; Y ÞPðY jXq; DÞdyidY i
; (5)
where ELðq; iÞ is the expected loss for query q associated with
the ith document and Y i
is the vector Y after removing yi.
4 ALGORITHM DERIVATION
We now provide practical implementation details of the ELO
principle for active learning and in particular specify how to
compute equations (4) and (5) in case of the DCG loss.
The difficulty of implementing the formulations of the
previous section lies in the fact that the computation of the
posterior distributions Pðyijxi; DÞ and the integrals is in
general intractable. For this reason, we instead use an
ensemble of learners and replace the integrals by sums over
the ensemble. As in [1], we propose to use bootstrap to con-
struct the ensemble. More precisely, the labeled set is sub-
sampled several times and for each subsample, a relevance
function is learned. The predictive distribution for a docu-
ment is then given by the predicted relevance scores by var-
ious functions in the ensemble. The use of bootstrap to
estimate predictive distributions is not new and there has
been some work investigating whether the two procedures
are equivalent [16].
Finally note that in our framework we need to estimate
the relevance scores; therefore, we concentrate in this paper
on pointwise approaches for learning to rank, since pairwise
and listwise approaches may not provide relevance scores.
However, the proposed algorithm can be applied to situa-
tions when pairwise and listwise approaches can generate
relevance scores.
4.1 Query Level Active Learning
If the metric of interest is DCG, the associated loss is the dif-
ference between the DCG for that ranking and the ranking
with largest DCG:
‘ðp; Y Þ ¼ max
p0
DCGðp0
; Y Þ À DCGðp; Y Þ; (6)
where
DCGðp; Y Þ ¼
X
i
2yi À 1
log2ð1 þ pðiÞÞ
: (7)
Substituting (6) into (4), the expected loss for a given q is
expressed as follows:
ELðqÞ ¼ min
p
Z
Y
ðmax
p0
DCGðp0
; Y Þ À DCGðp; Y ÞÞ
PðY jXq; DÞdY
¼
Z
Y
max
p0
DCGðp0
; Y ÞPðY jXq; DÞdY
À max
p
Z
Y
DCGðp; Y ÞPðY jXq; DÞdY:
(8)
The derivation of the second step uses the fact that the first
term in the loss function (6) is a constant w.r.t. p.
The maximum in the first component of (8) can easily be
found by sorting the documents according to Y . We rewrite
the integral in the second component as:
Z
Y
DCGðp; Y ÞPðY jXq; DÞdY
¼
X
i
1
log2ð1 þ pðiÞÞ
Z
yi
ð2yi À 1ÞPðyijxi; DÞdyi
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
:¼ti
; (9)
with which the maximum can now be found by sorting
the ti.
The pseudo-code for selecting queries based on equa-
tion (8) is presented in Algorithm 1. The notations and defi-
nitions are as follows:
G is the gain function defined as
GðsÞ ¼ 2s
À 1:
.
The notation Áh i means average. For instance,
dih i ¼
1
N
XN
i¼1
di:
BDCG is a function which takes as input a set of gain
values and returns the corresponding best DCG:
BDCGðfgjgÞ ¼
X
j
gj
log2ð1 þ pÃðjÞÞ
;
where pÃ
is the permutation sorting the gj in decreas-
ing order.
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1183
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
5. Algorithm 1. Query Level ELO-DCG Algorithm
Input: Labeled set L, unlabeled set U, the number of queries to
be selected nq.
Output: The selected query set Q.
Method:
for i ¼ 1; . . . ; N do N ¼ size of the ensemble
Subsample L and learn a relevance function
si
j score predicted by that function on the jth document
in U.
end for
for q ¼ 1; . . . ; Q do Q ¼ number of queries in U
I documents associated to q
for i ¼ 1; . . . ; N do
di BDCGðfGðsi
jÞgj2I Þ
end for
tj hGðsi
jÞi
d BDCGðftjgj2I Þ
ELðqÞ dih i À d
end for
Return Q contains nq queries with the highest values ELðqÞ:
4.2 Document Level Active Learning
Substituting (6) into (5), the expected loss of the ith docu-
ment is expressed as:
ELðq; iÞ ¼
Z
Y i
min
p
Z
yi
ðmax
p0
DCGðp0
; Y Þ À DCGðp; Y ÞÞdyidY i
;
¼
Z
Y i
Z
yi
max
p0
DCGðp0
; Y ÞPðY jXq; DÞdyi
À max
p
Z
yi
DCGðp; Y ÞPðY jXq; DÞdyi
#
dY i
;
(10)
which is similar to equation (8) except that the uncertainty is
on yi instead of the entire vector Y and that there is an outer
expectation on the relevance values for the other documents.
The corresponding pseudo-code is provided in Algorithm 2.
Algorithm 2. Document Level ELO-DCG Algorithm
Input: Labeled set L, unlabeled set U, the number of documents
to be selected nu.
Output: The selected document set D.
Method:
for i ¼ 1; . . . ; N do N ¼ size of the ensemble
Subsample L and learn a relevance function
si
j score predicted by that function on the jth document
in U.
end for
for all j 2 U do
for i ¼ 1; . . . ; N do
tk si
k; 8k 6¼ j
for p ¼ 1; . . . ; N do
tj sp
j
dp BDCGðfGðtkÞgÞ
end for
gk Gðsi
kÞ; 8k 6¼ j
gj hGðsi
jÞi
ELðjÞ dp
À BDCGðfgkgÞ
end for
end for
Return D that contains nu documents with the highest values of
ELðjÞ:
4.3 Two-Stage Active Learning
Both query level and document level active learning have
their own drawbacks. Since query level active learning selects
all documents associated with a query, it is tend to include
non-informative documents when there are a large number of
documents associated with each query. For example, in Web
search applications, there are large amount of Web docu-
ments associated for a query; most of them are non-informa-
tive, since the quality of a ranking function is mainly
measured by its ranking output on a small number of top
ranked Web documents. On the other hand, document level
active learning selects documents individually. This selection
process implies unrealistic assumption that documents are
independent, which leads to some undesirable results. For
example, an informative query could be missed if none of its
documents is selected; or only one document is selected for a
query, which is not a good example in ranking learning.
Algorithm 3. Two Stage ELO-DCG Algorithm
Input: Labeled set L, unlabeled doc U, the number of queries to
be selected nq, the number of documents to be selected for each
query nu.
Output: The selected query-document set D.
Method:
for i ¼ 1; . . . ; N do N ¼ size of the ensemble
Subsample L and learn a relevance function
si
j score predicted by that function on the jth document
in U.
end for
for q ¼ 1; . . . ; Q do Q ¼ number of queries in U
I documents associated to q
for i ¼ 1; . . . ; N do
di BDCGðfGðsi
jÞgj2I Þ
end for
tj hGðsi
jÞi
d BDCGðftjgj2I Þ
ELðqÞ dih i À d
end for
Select nq queries with the highest values ELðqÞ into Q.
for all q 2 Q do
I documents associated to q
for all j 2 I do
for i ¼ 1; . . . ; N do
tk si
k; 8k 6¼ j
for p ¼ 1; . . . ; N do
tj sp
j
dp BDCGðfGðtkÞgÞ
end for
gk Gðsi
kÞ; 8k 6¼ j
gj hGðsi
jÞi
ELðjÞ dp
À BDCGðfgkgÞ
end for
Select nu documents with the highest values of ELðjÞ into
Dq
end for
D ¼ D [ Dq
end for
Return D.
Therefore, it is natural to combine query level and docu-
ment level into two-stage active learning. A realistic
1184 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
6. assumption for ranking data is that queries are indepen-
dent and the documents are independent given on a
query. Based on this assumption, we propose the two-
stage ELO-DCG algorithm that is summarized in
Algorithm 3. Algorithm 3 first selects most informative
queries; then, selects the most informative documents for
each selected query.
4.4 Balanced ELO Active Learning
In this section, we show how ELO framework can be eas-
ily adapted to address the grade imbalance problem for
ranking.
In many ranking applications, especially Web search
ranking, the distribution of relevance scores are very
skewed. Under five grade scheme, ”Perfect-4”, ”Excellent-
3”, ”Good-2”, ”Fair-1”, and ”Bad-0”, there are usually much
less perfect and excellent examples in a randomly selected
Web search data set from a commercial search engine. Fig. 1
shows the grade distribution of a randomly selected Web
search data set. In Fig. 1, we observe that the excellent exam-
ples are less that 10 percent and perfect examples are less
than 2 percent.
On the other hand, users usually only care about the top
ranked documents, which are often perfect or excellent
documents, instead of entire collection of documents that
match a query. Intuitively, if a training set has very few per-
fect or excellent examples, it will be very difficult for the
ranking learner to learn decision boundaries to identify per-
fect and excellent documents. As a result, the learned rank-
ing function cannot be expected to have good performance.
In fact, [2] shows that selecting more relevant documents
into a training set can improve ranking models.
ELO framework can be easily to extended to address
the skew grade distribution problem by re-designing the
expected loss function. For the ELO-DCG algorithms,
we propose the following expected loss function for the
document selection, which selects the examples with
high expected DCG loss and at the same time put more
preference on highly relevant documents to obtain more
balanced train data,
ELðq; iÞ ¼
Z
Y i
min
p
Z
yi
sðyiÞ‘ðp; Y ÞPðY jXq; DÞdyidY i
; (11)
where sðyiÞ is a function to incorporate the prior knowledge
to balance the grade distribution. For example,
sðyÞ ¼ y (12)
puts preference on highly relevant documents; and simi-
larly,
sðyÞ ¼ y2
(13)
put even more on highly relevant documents; and
sðyÞ ¼ ðy À EðyÞÞ2
(14)
where E denotes expectation, puts preference on both highly
relevant documents and highly irrelevant documents.
In this paper we simply adopt the identity function in
Eq.(12) to show how to deal with grade skew problem as
shown in Fig. 1 under ELO framework. The approach can
be easily extended to other situations in the need of bal-
anced grade distribution.
Substituting (6) and (12) into (11), the expected loss of the
ith document is expressed as:
ELðq; iÞ ¼
Z
Y i
min
p
Z
yi
yiðmax
p0
DCGðp0
; Y Þ À DCGðp; Y ÞÞ
dyidY i
;
¼
Z
Y i
Z
yi
yi max
p0
DCGðp0
; Y ÞPðY jXq; DÞdyi
À max
p
Z
yi
yiDCGðp; Y ÞPðY jXq; DÞdyi
#
dY i
;
(15)
which is similar to equation (10) except that yi is used to
weight the loss function to put more weights on highly rele-
vant documents.
When using (15) to select documents, we obtain balanced
ELO-DCG algorithms. The balanced two stage ELO-DCG
algorithm is summarized in Algorithm 4. Algorithm 4 first
selects most informative queries as in Algorithm 2; then,
selects the documents for each query by considering both
DCG loss and grade distribution, i.e., selects documents
with most potential to improve DCG and at the same time
puts more preference on the highly relevant document to
mitigate the skew grade distribution problem.
5 EXPERIMENTAL EVALUATION
As a general active learning algorithm for ranking, ELO-DCG
can be applied to a wide range of ranking applications. In this
section, we apply different versions of ELO-DCG algorithms
to web search ranking to demonstrate the properties and
effectiveness of our algorithm. We denote query level, docu-
ment level, and two-stage ELO-DCG algorithms as ELO-
DCG-Q, ELO-DCG-D, and ELO-DCG-QD, respectively.
Fig. 1. Grade distribution for a randomly selected web search data set.
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1185
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
7. Algorithm 4. Balanced two Stage ELO-DCG Algorithm
Input: Labeled set L, unlabeled doc U, the number of queries to
be selected nq, the number of documents to be selected for each
query nu.
Output: The selected query-document set D.
Method:
for i ¼ 1; . . . ; N do N ¼ size of the ensemble
Subsample L and learn a relevance function
si
j score predicted by that function on the jth document
in U.
end for
for q ¼ 1; . . . ; Q do Q ¼ number of queries in U
I documents associated to q
for i ¼ 1; . . . ; N do
di BDCGðfGðsi
jÞgj2I Þ
end for
tj hGðsi
jÞi
d BDCGðftjgj2I Þ
ELðqÞ dih i À d
end for
Select nq queries with the highest values ELðqÞ into Q.
for all q 2 Q do
I documents associated to q
for all j 2 I do
for i ¼ 1; . . . ; N do
tk si
k; 8k 6¼ j
for p ¼ 1; . . . ; N do
tj sp
j
dp BDCGðfGðtkÞgÞ
end for
gk Gðsi
kÞ; 8k 6¼ j
gj hGðsi
jÞi
sj hsp
j i
ELðjÞ sjð dp
À BDCGðfgkgÞÞ
end for
Select nu documents with the highest values of ELðjÞ into
Dq
end for
D ¼ D [ Dq
end for
Return D.
5.1 Data Sets
We use web search data from a commercial search engine.
The data set consists of a random sample of about 10,000
queries with about half million web documents. Those
query-document pairs are labeled using a five-grade label-
ing scheme: {Bad, Fair, Good, Excellent, Perfect}.
For a query-document pair (q; d), a feature vector x is
generated and the features generally fall into the following
three categories. Query features comprise features depen-
dent on the query q only and have constant values across all
the documents, for example, the number of terms in the
query, whether or not the query is a person name, etc. Docu-
ment features comprise features dependent on the docu-
ment d only and have constant values across all the queries,
for example, the number of inbound links pointing to the
document, the amount of anchor-texts in bytes for the docu-
ment, and the language identity of the document, etc.
Query-document features comprise features dependent on
the relation of the query q with respect to the document d,
for example, the number of times each term in the query q
appears in the document d, the number of times each term
in the query q appears in the anchor-texts of the document
d, etc. We selected about five hundred features in total.
We randomly divide this data set into three subsets, base
training set, active learning set, and test set. From the base
training set, we randomly sample three small data sets to
simulate small size labeled data sets L. The active learning
data set is used as a large size unlabeled data set U from
which active learning algorithms will select the most infor-
mative examples. The true labels from active learning set
are not revealed to the ranking learners unless the examples
are selected for active learning. The test set is used to evalu-
ate the ranking functions trained with the selected examples
plus base set examples. We kept test set large to have rigor-
ous evaluations on ranking functions. The sizes of those five
data sets are summarized in Table 1.
5.2 Experimental Setting
For the learner, we use gradient boosting decision tree
(GBDT) [15].
The input for ELO-DCG algorithm is a base data set L
and the AL data set U. The size of the function ensemble
is set as eight for all experiments. ELO-DCG algorithm
selects top m informative examples; those m examples are
then added to the base set to train a new ranking func-
tion; the performance of this new function is then evalu-
ated on the test set. Each algorithm with each base set is
tested on 14 different m, 500, 1,000, 2,000, 4,000, 6,000,
8,000, 10,000, 12,000, 16,000, 24,000, 32,000, 48,000, 64,000,
and 80,000. For every experimental setting, 10 runs are
repeated and in each run the base set is re-sampled to
generate a new function ensemble.
For the performance measure for ranking models, we
select to use DCG-k, since users of a search engine are only
interested in the top-k results of a query rather than a sorted
order of the entire document collection. In this study, we
select k as 10. The average DCG of 10 runs is reported for
each experiment setting.
5.3 Document Level Active Learning
We first investigate document level active learning, since
documents correspond to basic elements to be selected in
the traditional active learning framework. We compare doc-
ument level ELO-DCG algorithm with random selection
(denoted by Random-D) and a classical active learning
approach based on variance reduction (VR) [7], which
selects document examples with largest variance on the pre-
diction scores. Concretely, VR based approach first random
sample data to train multiple models; then, it applies
TABLE 1
Sizes of the Five Data Sets
Data set Number of examples
base set 2k $2,000
base set 4k $ 4,000
base set 8k $8,000
AL set $160,000
test set $180,000
1186 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
8. multiple models to the candidate document examples to
obtain multiple prediction scores; finally, it selects the docu-
ments with largest variance on the prediction scores.
Fig. 2 compares the three document level active learning
methods in terms of DCG-10 of the resulting ranking func-
tions on the test set. Those ranking functions are trained
with base data set and the selected examples. X-axis denotes
number of examples selected by the active learning algo-
rithm. For all three methods, the DCG increases with the
number of added examples. This agrees with the intuition
that the quality of a ranking function is positively correlated
with the number of examples in the training set. ELO-DCG
consistently outperforms the other two methods. An possi-
ble explanation is that ELO-DCG optimizes the expected
DCG loss that is directly related to the objective function
DCG-10 used to evaluate the ranking quality; on the other
hand, the prediction score variance from VR is not directly
related to DCG driven objective. In fact, VR performs even
worse than random document selection when the size of the
selected example is small. An advantage of the ELO-DCG
algorithm is its capability to optimize directly based on the
ultimate loss function to measure ranking quality.
5.4 Query Level Active Learning
In this section, we show that query level ELO-DCG algo-
rithm effectively selects informative queries to improve the
learning to rank performance. Since traditional active learn-
ing approaches cannot directly applied to query selection in
ranking, we compare it with random query selection
(denoted by Random-Q) used in practice.
Fig. 3 shows the DCG comparison results. we observe
that for all three base sets, ELO-DCG performs better
than random selection for all different sample sizes (from
500 to 80,000) that are added to base sets. Moreover,
ELO-DCG converges much faster than random selection,
i.e., ELO-DCG attains the best DCG that the whole AL
data set can attain with much less examples added to the
train data.
5.5 Two-Stage Active Learning
In this section, we compare two-stage ELO-DCG algorithm
with other two two-stage active learning algorithms. One is
two-stage random selection, i.e. random query selection fol-
lowed by random document selection for each query. The
other one is a widely used approach in practice, which first
Fig. 2. DCG comparison of document level ELO-DCG, variance reduction based document selection, and random document selection with base sets
of sizes 2,4, and 8k shows that ELO-DCG algorithm outperforms the other two document selection methods at various sizes of selected examples.
Fig. 3. DCG comparisons of query level ELO-DCG and random query selection with base sets of sizes 2,4, and 8k shows that ELO-DCG algorithm
outperforms random selection at various sizes of selected examples.
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1187
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
9. randomly selects queries and then select top k relevant
documents for each query based on current ranking func-
tions (such as top k web sites returned by the current search
engine) [24]. In our experimental setting, this approach cor-
responds to randomly query selection followed by selecting
k documents with highest mean relevance scores within
each selected query. We denote this approach as top-K. In
all three two-stage algorithms, we simply fix the number of
documents per query at 15 based on the results from [24].
Fig. 4 shows the DCG comparison results for two-stage
active learning. We observe that among all three base sets,
ELO-DCG performs the best and top-K performs the sec-
ond. This result demonstrates that two-stage OLE-DCG
effectively select most informative documents for most
informative queries. A possible reason that top-K performs
better than random selection is that top-k selects more per-
fect and excellent examples. Those examples contribute
more to DCG than bad and fair examples.
We have observed that ELO-DCG algorithms perform
best in all three active learning scenarios, query level, docu-
ment level, and two stage active learning. Next, we compare
three versions of ELO-DCG with each other.
Fig. 5 shows DCG comparisons of two-stage ELO-DCG,
query level ELO-DCG, and document level ELO-DCG. We
observe that for all three based sets, two stage ELO-DCG
performs best. The reason that two-stage algorithm per-
forms best may root in its reasonable assumption for the
ranking data: queries are independent; the documents are
conditionally independent given a query. On the other
hand, the document level algorithm makes the incorrect
assumption about document independence and may miss
informative information at the query level; the query level
algorithm selects all documents associated with a query,
which are not all informative.
From the above results, we can observe that compared
with the widely used top-K approach in practice, two-stage
ELO-DCG algorithm can significantly reduce the labeling
cost. Table 2 summarizes the cost reduction of two-stage
algorithm compared with top-K approach.
In Table 2, the saturated size means that when the exam-
ples of this size are selected and added back to the base set
to train a ranking function, the performance of the learned
ranking function is equivalent to the ranking function
trained with all active learning data. From the first row of
Fig. 4. DCG comparisons of two-stage ELO-DCG, two-stage random selection, and top-K selection with base set 2,4, and 8k shows that ELO-DCG
algorithm performs best.
Fig. 5. DCG comparisons of two-stage ELO-DCG, query level ELO-DCG, and document level ELO-DCG,with base sets of sizes 2,4, and 8k shows
that two-stage ELO-DCG algorithm performs best.
1188 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
10. Table 2, we observe that for the base set of size 2k, 64k is the
saturated size for two-stage ELO-DCG algorithm and 80k is
the saturated size for top-K approach; hence, 64k selected
examples from two-stage ELO-DCG algorithm is equivalent
to 80k selected examples from top-K approach. This means
that two-stage ELO-DCG algorithm can reduce the labeling
cost by 16k, which is 20 percent of 80k. The largest percent-
age of cost reduced, 64, percent is from base set 8k.
5.6 Balanced Two-Stage Active Learning
In this section, we compare balanced two-stage ELO-DCG
algorithm (denoted as B-ELO-DCG) with normal two-stage
ELO-DCG algorithm.
Fig. 6 shows DCG comparisons of balanced two-stage
ELO-DCG, and normal two stage ELO-DCG. We observe
that for all three based sets, balanced two stage ELO-DCG
performs better. Grade distribution analysis of the exam-
ples selected by two algorithms shows that averagely the
examples selected by balanced ELO-DCG algorithm have
10.1 percent more excellent documents and 5.2 percent
more perfect documents, i.e., the data sets form balanced
ELO-DCG have more balanced grade distributions toward
to highly relevant documents. Balanced ELO-DCG algo-
rithm selects examples with both high potential to improve
DCG and more balanced grade distribution, which helps
learning algorithm to learn better decision boundaries for
identify highly relevant documents; hence, it leads better
DCG performance.
6 CONCLUSIONS AND FUTURE WORK
We propose a general expected loss optimization frame-
work for ranking, which is applicable to active learning
scenarios for various ranking learners. Under ELO frame-
work, we derive novel algorithms, query level ELO-DCG
and document level ELO-DCG, to select most informative
examples to minimize the expected DCG loss. We propose a
two stage active learning algorithm to select the most effec-
tive examples for the most effective queries. We further
extend the proposed algorithm to deal with the typical
skew grade distribution problem in learning to rank. Exten-
sive experiments on real-world web search data sets have
demonstrated great potential and effectiveness of the pro-
posed framework and algorithms.
In future, we will investigate how to fuse the query level
and document level selection steps in order to produce a more
robust query selection strategy. Besides, we will also evaluate
our active learning method upon different types of data.
ACKNOWLEDGMENTS
This research was partially supported by National Natural
Science Foundation of China (No. 61003107; No. 61221001)
and the High Technology Research and Development Pro-
gram of China (No. 2012AA011702).
REFERENCES
[1] N. Abe and H. Mamitsuka, “Query learning strategies using boost-
ing and bagging,” in Proc. 15th Int. Conf. Mach. Learn., 1998, pp. 1–9.
[2] J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz,
“Document selection methodologies for efficient and effective
learning-to-rank,” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, 2009, pp. 468–475.
[3] J. Berger, Statistical Decision Theory and Bayesian Analysis. New
York, NY, USA: Springer, 1985.
[4] C. Campbell, N. Cristianini, and A. Smola, “Query learning with
large margin classifiers,” in Proc. 17th Int. Conf. Mach. Learn., 2000,
pp. 111–118.
TABLE 2
Cost Reduction of Two-Stage Algorithm Compared with Top-K Approach
Base set size Saturated size for ELO-DCG Saturated size for top-K Percentage of cost reduced
2k 64k 80k 20%
4k 48k 80k 40%
8k 32k 80k 64%
Fig. 6. DCG comparisons of two-stage ELO-DCG and balanced two-stage ELO-DCG, with base sets of sizes 2,4, and 8k shows that balanced two-
stage ELO-DCG algorithm performs better.
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1189
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
11. [5] B. Carterette, J. Allan, and R. Sitaraman, “Minimal test collections
for retrieval evaluation,” in Proc. 29th Annu. Int. ACM SIGIR Conf.
Res. Develop. Inform. Retrieval, 2006, pp. 268–275.
[6] W. Chu and Z. Ghahramani, “Extensions of Gaussian processes
for ranking: Semi-supervised and active learning,” in Proc. Nips
Workshop Learn. Rank, 2005, pp. 33–38.
[7] D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning
with statistical models,” in Proc. Adv. Neural Inf. Process. Syst.,
1995, vol. 7, pp. 705–712.
[8] D. Cossock and T. Zhang, “Subset ranking using regression,”
in Proc. 19th Annu. Conf. Learn. Theory, 2006, pp. 605–619.
[9] I. Dagan and S. P. Engelson, “Committee-based sampling for
training probabilistic classifiers,” in Proc. 12th Int. Conf. Mach.
Learn., 1995, pp. 150–157.
[10] P. Donmez and J. Carbonell, “Active sampling for rank learning
via optimizing the area under the ROC curve,” in Proc. 31th Eur.
Conf. IR Res. Adv. Inform. Retrieval, 2009, pp. 78–89.
[11] P. Donmez and J. G. Carbonell, “Optimizing estimated loss reduc-
tion for active sampling in rank learning,” in Proc. 25th Int. Conf.
Mach. learn., 2008, pp. 248–255.
[12] V. Fedorov, Theory of Optimal Experiments. New York, NY, USA:
Academic, 1972.
[13] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient
boosting algorithm for combining preferences,” J. Mach. Learn.
Res., vol. 4, pp. 933–969, 2003.
[14] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, “Selective sam-
pling using the query by committee algorithm,” Mach. Learn.,
vol. 28, nos. 2–3, pp. 133–168, 1997.
[15] J. Friedman, “Greedy function approximation: A gradient boost-
ing machine,” Ann. Statist., vol. 29, pp. 1189–1232, 2001.
[16] T. Fushiki, “Bootstrap prediction and Bayesian prediction under
misspecified models,” Bernoulli, vol. 11, no. 4, pp. 747–758, 2005.
[17] D. D. Lewis and W. A. Gale, “A sequential algorithm for training
text classifiers,” in Proc. 17th Annual Int. ACM SIGIR Conf. Research
Development Info. Retrieva (SIGIR ’9), 1994, pp. 3–12.
[18] D. Lewis and W. Gale, “Training text classifiers by uncertainty
sampling,” in Proc. 17th Annu. Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, 1994, pp. 3–12.
[19] B. Long, O. Chapelle, Y. Zhang, Y. Chang, Z. Zheng, and B. Tseng,
“Active Learning for Ranking through Expected Loss Opti-
mization,” in Proc. 33rd Annu. ACM SIGIR Conf., 2010, pp. 267–274.
[20] A. McCallum and K. Nigam, “Employing EM and pool-based
active learning for text classification,” in Proc. 5th Int. Conf. Mach.
Learn., 1998, pp. 359–367.
[21] B. Settles, “Active learning literature survey,” Comput. Sci. Dept.,
Univ. of Wisconsin, Madison, WI, USA, Tech. Rep. 1648, 2009.
[22] F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li, “Listwise
approach to learning to rank: theory and algorithm,” in Proc. 25th
Int. Conf. Mach. Learn., 2008, pp. 1192–1199.
[23] L. Yang, L. Wang, B. Geng, and X.-S. Hua, “Query sampling for
ranking learning in web search,” in Proc. 32nd Int. ACM SIGIR
Conf. Res. Develop. Inform. Retrieval, 2009, pp. 754–755.
[24] E. Yilmaz and S. Robertson, “Deep versus shallow judgments in
learning to rank,” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, 2009, pp. 662–663.
[25] H. Yu, “SVM selective sampling for ranking with application to
data retrieval,” in Proc. 11th ACM SIGKDD Int. Conf. Knowl. Dis-
covery. Data Mining, 2005, pp. 354–363.
[26] Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun, “A
general boosting method and its application to learning ranking
functions for web search,” in Proc. Adv. Neural Inf. Process. Syst.
20, 2008, pp. 1697–1704.
[27] O. Zoeter, N. Craswell, M. Taylor, J. Guiver, and E. Snelson, “A
decision theoretic framework for implicit relevance feedback,”
in Proc. NIPS Workshop Mach. Learn. Web Search, 2007, pp. 133–142.
[28] T. Joachims, “Optimizing search engines using clickthrough
data,” in Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2002, pp. 133–142.
[29] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamil-
ton, and G. Hullender, “Learning to rank using gradient descent,”
in Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 89–96.
[30] Y. Freund, R. Iyer, R. Schapire, and Y. Singer, “An efficient boost-
ing algorithm for combining preferences,” in Proc. 15th Int. Conf.
Mach. Learn., 1998, pp. 391–398.
[31] J. Xu and H. Li, “Adarank: A boosting algorithm for information
retrieval,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, 2007, pp. 391–398.
[32] C. Cortes, M. Mohri, and A. Rastogi, “Magnitude-preserving
ranking algorithms,” in Proc. 24th Int. Conf. Mach. Learn., 2007,
pp. 169–176.
[33] Z. Zheng, K. Chen, G. Sun, and H. Zha, “A regression framework
for learning ranking functions using relative relevance
judgments,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop.
Inform. Retrieval, 2007, pp. 287–294.
[34] J. Guiver and E. Snelson, “Learning to rank with SoftRank and
Gaussian processes,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res.
Develop. Inform. Retrieval, 2008, pp. 259–266.
[35] S. Rodrigo, M. A. Gonalves, and A. Veloso, “Rule-based active
sampling for learning to rank,” in Proc. Eur. Conf. Mach. Learn.
Knowl. Discovery Databases, 2011, pp. 240–255.
[36] B. Qian, H. Li, J. Wang, X. Wang, and I. Davidson, “Active Learn-
ing to Rank using Pairwise Supervision,” in Proc. 13th SIAM Int.
Conf. Data Mining, 2013, pp. 297–305.
Bo Long is a staff applied researcher at LinkedIn
Inc, and was formerly a senior research scientist
at Yahoo! Labs. His research interests include
data mining and machine learning with applica-
tions to web search, recommendation, and social
network analysis. He holds eight innovations and
has published peer-reviewed papers in top con-
ferences and journals including ICML, KDD,
ICDM, AAAI, SDM, CIKM, and KAIS. He has
served as a reviewer, workshops co-organizer,
conference organizer committee member, and
area chair for multiple conferences, including KDD, NIPS, SIGIR, ICML,
SDM, CIKM, JSM, etc.
Jiang Bian received the PhD degree in computer
science from the Georgia Institute of Technology,
Atlanta GA, in 2010. He is a researcher at Micro-
soft Research. Prior to that, he was a scientist at
Yahoo! Labs. His research interests include
machine learning, information retrieval, data min-
ing, social network analysis, and computational
advertising. More details of his research and
background can be found at https://sites.google.
com/site/jiangbianhome/.
Olivier Chapelle received his graduation in theo-
retical computer science from the Ecole Normale
Suprieure de Lyon in 1999 and the PhD degree
form the University of Paris in 2002. He is a prin-
cipal research scientist at Criteo, where he works
on machine learning for display advertising. Prior
to that, he was part of the machine learning group
of Yahoo! Research and before that he worked at
the Max Planck Institute in Tbingen. His main
research interests include kernel machines,
semi-supervised learning, ranking and large
scale learning. He has published more than 80 publications and has
been granted more than 10 patents. He has served as an associated
editor for the Machine Learning Journal and Transactions on Pattern
Analysis and Machine Intelligence.
Ya Zhang received the BS degree from Tsinghua
University and the PhD degree in information sci-
ences and technology from the Pennsylvania
State University. Since March 2010, she has
been a professor in the Institute of Image Com-
munication and Networking at Shanghai Jiao
Tong University. Prior to that, she worked at Law-
rence Berkeley National Laboratory, University of
Kansas, and Yahoo! Labs. Her research interest
include data mining and machine learning, with
applications to information retrieval, web mining,
and multimedia analysis.
1190 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457
12. Yoshiyuku Inagaki received the PhD degree in
social sciences, specialized on mathematical
behavioral science. He is a research scientist at
Yahoo! Labs. Most of his research centers
around machine learned ranking. Before joining
Yahoo! Labs, he worked on medical image analy-
sis systems at the University of California, Irvine.
His current research interests are machine learn-
ing, deep learning, topic modeling, and artificial
intelligence.
Yi Chang is a director of science in Yahoo
Labs, where he is managing the search and
anti-abuse science team. His research interests
include information retrieval, applied machine
learning, natural language processing, and
social media mining. He has published more
than 70 referred journal and conference publi-
cations, and he is a co-author of Relevance
Ranking for Vertical Search Engines, which is
the first book of this area.
For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
LONG ET AL.: ACTIVE LEARNING FOR RANKING THROUGH EXPECTED LOSS OPTIMIZATION 1191
For More Details Contact G.Venkat Rao
PVR TECHNOLOGIES 8143271457