As prototypes of data glasses having both data augmentation and gaze tracking capabilities are becoming available, it is now possible to develop proactive gaze-controlled user interfaces to display information about objects, people, and other entities in real-world setups. In order to decide which objects the augmented information should be about, and how saliently to augment, the system needs an estimate of the importance or relevance of the objects of the scene for the user at a given time. The estimates will be used to minimize distraction of the user, and for providing efficient spatial management of the augmented items. This work is a feasibility study on inferring the relevance of objects in dynamic scenes from gaze. We collected gaze data from subjects watching a video for a pre-defined task. The results show that a simple ordinal logistic regression model gives relevance rankings of scene objects with a promising accuracy.
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEcscpconf
Advances in technology have brought about extensive research in the field of image fusion.
Image fusion is one of the most researched challenges of Face Recognition. Face Recognition
(FR) is the process by which the brain and mind understand, interpret and identify or verify
human faces.. Image fusion is the combination of two or more source images which vary in
resolution, instrument modality, or image capture technique into a single composite
representation. Thus, the source images are complementary in many ways, with no one input
image being an adequate data representation of the scene. Therefore, the goal of an image
fusion algorithm is to integrate the redundant and complementary information obtained from
the source images in order to form a new image which provides a better description of the scene
for human or machine perception. In this paper we have proposed a novel approach of pixel
level image fusion using PCA that will remove the image blurredness in two images and
reconstruct a new de-blurred fused image. The proposed approach is based on the calculation
of Eigen faces with Principal Component Analysis (PCA). Principal Component Analysis (PCA)
has been most widely used method for dimensionality reduction and feature extraction
Image segmentation by modified map ml estimationsijesajournal
Though numerous algorithms exist to perform image segmentation there are several issues
related to execution time of these algorithm. Image Segmentation is nothing but label relabeling
problem under probability framework. To estimate the label configuration, an iterative
optimization scheme is implemented to alternately carry out the maximum a posteriori (MAP)
estimation and the maximum likelihood (ML) estimations. In this paper this technique is
modified in such a way so that it performs segmentation within stipulated time period. The
extensive experiments shows that the results obtained are comparable with existing algorithms.
This algorithm performs faster execution than the existing algorithm to give automatic
segmentation without any human intervention. Its result match image edges very closer to
human perception.
Soft computing is likely to play aprogressively important role in many applications including image enhancement. The paradigm for soft computing is the human mind. The soft computing critique has been particularly strong with fuzzy logic. The fuzzy logic is facts representationas a
rule for management of uncertainty. Inthis paperthe Multi-Dimensional optimized problem is addressed by discussing the optimal thresholding usingfuzzyentropyfor Image enhancement. This technique is compared with bi-level and multi-level thresholding and obtained optimal
thresholding values for different levels of speckle noisy and low contrasted images. The fuzzy entropy method has produced better results compared to bi-level and multi-level thresholding techniques.
PERFORMANCE ANALYSIS OF CLUSTERING BASED IMAGE SEGMENTATION AND OPTIMIZATION ...cscpconf
Partitioning of an image into several constituent components is called image segmentation.
Myriad algorithms using different methods have been proposed for image segmentation. Many
clustering algorithms and optimization techniques are also being used for segmentation of
images. A major challenge in segmentation evaluation comes from the fundamental conflict
between generality and objectivity. As there is a glut of image segmentation techniques
available today, customer who is the real user of these techniques may get obfuscated. In this
paper to address the above described problem some image segmentation techniques are evaluated based on their consistency in different applications. Based on the parameters used quantification of different clustering algorithms is done.
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEcscpconf
Advances in technology have brought about extensive research in the field of image fusion.
Image fusion is one of the most researched challenges of Face Recognition. Face Recognition
(FR) is the process by which the brain and mind understand, interpret and identify or verify
human faces.. Image fusion is the combination of two or more source images which vary in
resolution, instrument modality, or image capture technique into a single composite
representation. Thus, the source images are complementary in many ways, with no one input
image being an adequate data representation of the scene. Therefore, the goal of an image
fusion algorithm is to integrate the redundant and complementary information obtained from
the source images in order to form a new image which provides a better description of the scene
for human or machine perception. In this paper we have proposed a novel approach of pixel
level image fusion using PCA that will remove the image blurredness in two images and
reconstruct a new de-blurred fused image. The proposed approach is based on the calculation
of Eigen faces with Principal Component Analysis (PCA). Principal Component Analysis (PCA)
has been most widely used method for dimensionality reduction and feature extraction
Image segmentation by modified map ml estimationsijesajournal
Though numerous algorithms exist to perform image segmentation there are several issues
related to execution time of these algorithm. Image Segmentation is nothing but label relabeling
problem under probability framework. To estimate the label configuration, an iterative
optimization scheme is implemented to alternately carry out the maximum a posteriori (MAP)
estimation and the maximum likelihood (ML) estimations. In this paper this technique is
modified in such a way so that it performs segmentation within stipulated time period. The
extensive experiments shows that the results obtained are comparable with existing algorithms.
This algorithm performs faster execution than the existing algorithm to give automatic
segmentation without any human intervention. Its result match image edges very closer to
human perception.
Soft computing is likely to play aprogressively important role in many applications including image enhancement. The paradigm for soft computing is the human mind. The soft computing critique has been particularly strong with fuzzy logic. The fuzzy logic is facts representationas a
rule for management of uncertainty. Inthis paperthe Multi-Dimensional optimized problem is addressed by discussing the optimal thresholding usingfuzzyentropyfor Image enhancement. This technique is compared with bi-level and multi-level thresholding and obtained optimal
thresholding values for different levels of speckle noisy and low contrasted images. The fuzzy entropy method has produced better results compared to bi-level and multi-level thresholding techniques.
PERFORMANCE ANALYSIS OF CLUSTERING BASED IMAGE SEGMENTATION AND OPTIMIZATION ...cscpconf
Partitioning of an image into several constituent components is called image segmentation.
Myriad algorithms using different methods have been proposed for image segmentation. Many
clustering algorithms and optimization techniques are also being used for segmentation of
images. A major challenge in segmentation evaluation comes from the fundamental conflict
between generality and objectivity. As there is a glut of image segmentation techniques
available today, customer who is the real user of these techniques may get obfuscated. In this
paper to address the above described problem some image segmentation techniques are evaluated based on their consistency in different applications. Based on the parameters used quantification of different clustering algorithms is done.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
Geometric Correction for Braille Document Images csandit
Image processing is an important research area in computer vision. clustering is an unsupervised
study. clustering can also be used for image segmentation. there exist so many methods for image
segmentation. image segmentation plays an important role in image analysis.it is one of the first
and the most important tasks in image analysis and computer vision. this proposed system
presents a variation of fuzzy c-means algorithm that provides image clustering. the kernel fuzzy
c-means clustering algorithm (kfcm) is derived from the fuzzy c-means clustering
algorithm(fcm).the kfcm algorithm that provides image clustering and improves accuracy
significantly compared with classical fuzzy c-means algorithm. the new algorithm is called
gaussian kernel based fuzzy c-means clustering algorithm (gkfcm)the major characteristic of
gkfcm is the use of a fuzzy clustering approach ,aiming to guarantee noise insensitiveness and
image detail preservation.. the objective of the work is to cluster the low intensity in homogeneity
area from the noisy images, using the clustering method, segmenting that portion separately using
content level set approach. the purpose of designing this system is to produce better segmentation
results for images corrupted by noise, so that it can be useful in various fields like medical image
analysis, such as tumor detection, study of anatomical structure, and treatment planning.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Comparative analysis and implementation of structured edge active contour IJECEIAES
This paper proposes modified chanvese model which can be implemented on image for segmentation. The structure of paper is based on Linear structure tensor (LST) as input to the variant model. Structure tensor is a matrix illustration of partial derivative information. In the proposed model, the original image is considered as information channel for computing structure tensor. Difference of Gaussian (DOG) is featuring improvement in which we can get less blurred image than original image. In this paper LST is modified by adding intensity information to enhance orientation information. Finally Active Contour Model (ACM) is used to segment the images. The proposed algorithm is tested on various images and also on some images which have intensity inhomogeneity and results are shown. Also, the results with other algorithms like chanvese, Bhattacharya, Gabor based chanvese and Novel structure tensor based model are compared. It is verified that accuracy of proposed model is the best. The biggest advantage of proposed model is clear edge enhancement.
Review and comparison of tasks scheduling in cloud computingijfcstjournal
Recently, there has been a dramatic increase in the popularity of cloud computing systems that rent
computing resources on-demand, bill on a pay-as-you-go basis, and multiplex many users on the same
physical infrastructure. It is a virtual pool of resources which are provided to users via Internet. It gives
users virtually unlimited pay-per-use computing resources without the burden of managing the underlying
infrastructure. One of the goals is to use the resources efficiently and gain maximum profit. Scheduling is a
critical problem in Cloud computing, because a cloud provider has to serve many users in Cloud
computing system. So scheduling is the major issue in establishing Cloud computing systems. The
scheduling algorithms should order the jobs in a way where balance between improving the performance
and quality of service and at the same time maintaining the efficiency and fairness among the jobs. This
paper introduces and explores some of the methods provided for in cloud computing has been scheduled.
Finally the waiting time and time to implement some of the proposed algorithm is evaluated
Graph Theory Based Approach For Image Segmentation Using Wavelet TransformCSCJournals
This paper presents the image segmentation approach based on graph theory and threshold. Amongst the various segmentation approaches, the graph theoretic approaches in image segmentation make the formulation of the problem more flexible and the computation more resourceful. The problem is modeled in terms of partitioning a graph into several sub-graphs; such that each of them represents a meaningful region in the image. The segmentation problem is then solved in a spatially discrete space by the well-organized tools from graph theory. After the literature review, the problem is formulated regarding graph representation of image and threshold function. The boundaries between the regions are determined as per the segmentation criteria and the segmented regions are labeled with random colors. In presented approach, the image is preprocessed by discrete wavelet transform and coherence filter before graph segmentation. The experiments are carried out on a number of natural images taken from Berkeley Image Database as well as synthetic images from online resources. The experiments are performed by using the wavelets of Haar, DB2, DB4, DB6 and DB8. The results are evaluated and compared by using the performance evaluation parameters like execution time, Performance Ratio, Peak Signal to Noise Ratio, Precision and Recall and obtained results are encouraging.
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREijcsit
There are two defects in WOW. One is image feature is not considered when hiding information through minimal distortion path and it leads to high total distortion. Another is total distortion grows too rapidly with hidden capacity increasing and it leads to poor anti-detection when hidden capacity is large. To solve these two problems, a new algorithm named MDIS was proposed. MDIS is also based on the minimizing additive distortion framework of STC and has the same distortion function with WOW. The feature that there are a large number of pixels, having the same value with one of their eight neighbour pixels and the mechanism of secret sharing are used in MDIS, which can reduce the total distortion, improve the antidetection and increase the value of PNSR. Experimental results showed that MDIS has better invisibility, smaller distortion and stronger anti-detection than WOW.
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGIJCSEA Journal
Thresholding is a fast, popular and computationally inexpensive segmentation technique that is always critical and decisive in some image processing applications. The result of image thresholding is not always satisfactory because of the presence of noise and vagueness and ambiguity among the classes. Since the theory of fuzzy sets is a generalization of the classical set theory, it has greater flexibility to capture faithfully the various aspects of incompleteness or imperfectness in information of situation. To overcome this problem, in this paper we proposed a two-stage fuzzy set theoretic approach to image thresholding utilizing the measure of fuzziness to evaluate the fuzziness of an image and to determine an adequate threshold value. At first, images are preprocessed to reduce noise without any loss of image details by fuzzy rule-based filtering and then in the final stage a suitable threshold is determined with the help of a fuzziness measure as a criterion function. Experimental results on test images have demonstrated the effectiveness of this method.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Analytical study of feature extraction techniques in opinion miningcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for
dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction
in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first
part discusses various techniques and second part makes a detailed appraisal of the major
techniques used for feature extraction
Original PowerPoint retrieved from http://www.mrsshirley.net/powerpoint/realidades/vocabulary/real1vocab/
real1vocab.htm. Educational use granted if credit given to author.
ASP.NET Web API is a framework that makes it easy to build HTTP services that reach a broad range of clients, including browsers and mobile devices. ASP.NET Web API is an ideal platform for building RESTful applications on the .NET Framework.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
Geometric Correction for Braille Document Images csandit
Image processing is an important research area in computer vision. clustering is an unsupervised
study. clustering can also be used for image segmentation. there exist so many methods for image
segmentation. image segmentation plays an important role in image analysis.it is one of the first
and the most important tasks in image analysis and computer vision. this proposed system
presents a variation of fuzzy c-means algorithm that provides image clustering. the kernel fuzzy
c-means clustering algorithm (kfcm) is derived from the fuzzy c-means clustering
algorithm(fcm).the kfcm algorithm that provides image clustering and improves accuracy
significantly compared with classical fuzzy c-means algorithm. the new algorithm is called
gaussian kernel based fuzzy c-means clustering algorithm (gkfcm)the major characteristic of
gkfcm is the use of a fuzzy clustering approach ,aiming to guarantee noise insensitiveness and
image detail preservation.. the objective of the work is to cluster the low intensity in homogeneity
area from the noisy images, using the clustering method, segmenting that portion separately using
content level set approach. the purpose of designing this system is to produce better segmentation
results for images corrupted by noise, so that it can be useful in various fields like medical image
analysis, such as tumor detection, study of anatomical structure, and treatment planning.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Comparative analysis and implementation of structured edge active contour IJECEIAES
This paper proposes modified chanvese model which can be implemented on image for segmentation. The structure of paper is based on Linear structure tensor (LST) as input to the variant model. Structure tensor is a matrix illustration of partial derivative information. In the proposed model, the original image is considered as information channel for computing structure tensor. Difference of Gaussian (DOG) is featuring improvement in which we can get less blurred image than original image. In this paper LST is modified by adding intensity information to enhance orientation information. Finally Active Contour Model (ACM) is used to segment the images. The proposed algorithm is tested on various images and also on some images which have intensity inhomogeneity and results are shown. Also, the results with other algorithms like chanvese, Bhattacharya, Gabor based chanvese and Novel structure tensor based model are compared. It is verified that accuracy of proposed model is the best. The biggest advantage of proposed model is clear edge enhancement.
Review and comparison of tasks scheduling in cloud computingijfcstjournal
Recently, there has been a dramatic increase in the popularity of cloud computing systems that rent
computing resources on-demand, bill on a pay-as-you-go basis, and multiplex many users on the same
physical infrastructure. It is a virtual pool of resources which are provided to users via Internet. It gives
users virtually unlimited pay-per-use computing resources without the burden of managing the underlying
infrastructure. One of the goals is to use the resources efficiently and gain maximum profit. Scheduling is a
critical problem in Cloud computing, because a cloud provider has to serve many users in Cloud
computing system. So scheduling is the major issue in establishing Cloud computing systems. The
scheduling algorithms should order the jobs in a way where balance between improving the performance
and quality of service and at the same time maintaining the efficiency and fairness among the jobs. This
paper introduces and explores some of the methods provided for in cloud computing has been scheduled.
Finally the waiting time and time to implement some of the proposed algorithm is evaluated
Graph Theory Based Approach For Image Segmentation Using Wavelet TransformCSCJournals
This paper presents the image segmentation approach based on graph theory and threshold. Amongst the various segmentation approaches, the graph theoretic approaches in image segmentation make the formulation of the problem more flexible and the computation more resourceful. The problem is modeled in terms of partitioning a graph into several sub-graphs; such that each of them represents a meaningful region in the image. The segmentation problem is then solved in a spatially discrete space by the well-organized tools from graph theory. After the literature review, the problem is formulated regarding graph representation of image and threshold function. The boundaries between the regions are determined as per the segmentation criteria and the segmented regions are labeled with random colors. In presented approach, the image is preprocessed by discrete wavelet transform and coherence filter before graph segmentation. The experiments are carried out on a number of natural images taken from Berkeley Image Database as well as synthetic images from online resources. The experiments are performed by using the wavelets of Haar, DB2, DB4, DB6 and DB8. The results are evaluated and compared by using the performance evaluation parameters like execution time, Performance Ratio, Peak Signal to Noise Ratio, Precision and Recall and obtained results are encouraging.
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREijcsit
There are two defects in WOW. One is image feature is not considered when hiding information through minimal distortion path and it leads to high total distortion. Another is total distortion grows too rapidly with hidden capacity increasing and it leads to poor anti-detection when hidden capacity is large. To solve these two problems, a new algorithm named MDIS was proposed. MDIS is also based on the minimizing additive distortion framework of STC and has the same distortion function with WOW. The feature that there are a large number of pixels, having the same value with one of their eight neighbour pixels and the mechanism of secret sharing are used in MDIS, which can reduce the total distortion, improve the antidetection and increase the value of PNSR. Experimental results showed that MDIS has better invisibility, smaller distortion and stronger anti-detection than WOW.
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGIJCSEA Journal
Thresholding is a fast, popular and computationally inexpensive segmentation technique that is always critical and decisive in some image processing applications. The result of image thresholding is not always satisfactory because of the presence of noise and vagueness and ambiguity among the classes. Since the theory of fuzzy sets is a generalization of the classical set theory, it has greater flexibility to capture faithfully the various aspects of incompleteness or imperfectness in information of situation. To overcome this problem, in this paper we proposed a two-stage fuzzy set theoretic approach to image thresholding utilizing the measure of fuzziness to evaluate the fuzziness of an image and to determine an adequate threshold value. At first, images are preprocessed to reduce noise without any loss of image details by fuzzy rule-based filtering and then in the final stage a suitable threshold is determined with the help of a fuzziness measure as a criterion function. Experimental results on test images have demonstrated the effectiveness of this method.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Analytical study of feature extraction techniques in opinion miningcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for
dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction
in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first
part discusses various techniques and second part makes a detailed appraisal of the major
techniques used for feature extraction
Original PowerPoint retrieved from http://www.mrsshirley.net/powerpoint/realidades/vocabulary/real1vocab/
real1vocab.htm. Educational use granted if credit given to author.
ASP.NET Web API is a framework that makes it easy to build HTTP services that reach a broad range of clients, including browsers and mobile devices. ASP.NET Web API is an ideal platform for building RESTful applications on the .NET Framework.
Video surveillance is active research topic in
computer vision research area for humans & vehicles, so it is
used over a great extent. Multiple images generated using a fixed
camera contains various objects, which are taken under different
variations, illumination changes after that the object’s identity
and orientation are provided to the user. This scheme is used to
represent individual images as well as various objects classes in a
single, scale and rotation invariant model.The objective is to
improve object recognition accuracy for surveillance purposes &
to detect multiple objects with sufficient level of scale
invariance.Multiple objects detection& recognition is important
in the analysis of video data and higher level security system. This
method can efficiently detect the objects from query images as
well as videos by extracting frames one by one. When given a
query image at runtime, by generating the set of query features
and it will find best match it to other sets within the database.
Using SURF algorithm find the database object with the best
feature matching, then object is present in the query image.
Enhancing the Design pattern Framework of Robots Object Selection Mechanism -...INFOGAIN PUBLICATION
In order to enable a computer to construct and display a three-dimensional array, solid objects from a single two-dimensional photograph, the rules and assumptions of depth perception have been carefully analyzed and mechanized. It is assumed that a photograph is a perspective projection of a set of objects which can be constructed from transformations of known three-dimensional models, and that the objects are supported by other visible objects or by a ground plane. These assumptions enable a computer to obtain a reasonable, three-dimensional description from the edge information in a photograph by means of a topological, mathematical process. A computer program has been written which can process a photograph into a line drawing .transform the line drawing into a three-dimensional representation and, finally, display the three-dimensional structure with all the hidden lines removed, from any point of view. The 2-D to 3-D construction and 3-D to 2-D display processes are sufficiently general to handle most collections of planar-surfaced objects and provide a valuable starting point for future investigation of computer-aided three-dimensional systems.
Development of Human Tracking System For Video Surveillancecscpconf
Visual surveillance in dynamic scenes, especially for human and some objects is one of the
most active research areas. An attempt has been made to this issue in this work. It has wide
spectrum of promising application including human identification to detect the suspicious
behavior, crowd flux statistics, and congestion analysis using multiple cameras.
In this paper deals with the problem of detecting and tracking multiple moving people in a static
background. Detection of foreground object is done by background subtraction. Detected
objects are identified and analyzed through different blobs. Then tracking is performed by
matching corresponding features of blob. An algorithm has been developed in this perspective
using Angular Deviation of Center of Gravity (ADCG), which gives a satisfying result for segmentation of human object.
The study evaluates three background subtraction techniques. The techniques ranges from very basic
algorithm to state of the art published techniques categorized based on speed, memory requirements and
accuracy. Such a review can effectively guide the designer to select the most suitable method for a given
application in a principled way. The algorithms used in the study ranges from varying levels of accuracy
and computational complexity. Few of them can also deal with real time challenges like rain, snow, hails,
swaying branches, objects overlapping, varying light intensity or slow moving objects.
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONijaia
Most of the currently known methods treat person re-identification task as classification problem and used commonly neural networks. However, these methods used only high-level convolutional feature or to express the feature representation of pedestrians. Moreover, the current data sets for person reidentification is relatively small. Under the limitation of the number of training set, deep convolutional networks are difficult to train adequately. Therefore, it is very worthwhile to introduce auxiliary data sets to help training. In order to solve this problem, this paper propose a novel method of deep transfer learning, and combines the comparison model with the classification model and multi-level fusion of the convolution features on the basis of transfer learning. In a multi-layers convolutional network, the characteristics of each layer of network are the dimensionality reduction of the previous layer of results, but the information of multi-level features is not only inclusive, but also has certain complementarity. We can using the information gap of different layers of convolutional neural networks to extract a better feature expression. Finally, the algorithm proposed in this paper is fully tested on four data sets (VIPeR, CUHK01, GRID and PRID450S). The obtained re-identification results prove the effectiveness of the algorithm.
Algorithmic Analysis to Video Object Tracking and Background Segmentation and...Editor IJCATR
Video object tracking and segmentation are the fundamental building blocks for smart surveillance
system. Various algorithms like partial least square analysis, Markov model, Temporal differencing,
background subtraction algorithm, adaptive background updating have been proposed but each were having
drawbacks like object tracking problem, multibackground congestion, illumination changes, occlusion etc.
The background segmentation worked on to principled object tracking by using two models Gaussian mixture
model and level centre model. Wavelet transforms have been one of the important signal processing
developments, especially for the applications such as time-frequency analysis, data compression,
segmentation and vision. The key idea of the wavelet transform approach is to represents any arbitrary
function f (t) as a superposition of a set of such wavelets or basis functions. Results show that algorithm
performs well to remove occlusion and multibackground congestion as well as algorithm worked with
removal of noise in the signals
This article aims at a new algorithm for tracking moving objects in the long term. We have tried to overcome some potential difficulties, first by a comparative study of the measuring methods of the difference and the similarity between the template and the source image. In the second part, an improvement of the best method allows us to follow the target in a robust way. This method also allows us to effectively overcome the problems of geometric deformation, partial occlusion and recovery after the target leaves the field of vision. The originality of our algorithm is based on a new model, which does not depend on a probabilistic process and does not require a data based detection in advance. Experimental results on several difficult video sequences have proven performance advantages over many recent trackers. The developed algorithm can be employed in several applications such as video surveillance, active vision or industrial visual servoing.
Automated Traffic sign board classification system is one of the key technologies of Intelligent
Transportation Systems (ITS). Traffic Surveillance System is being more and important with improving
urban scale and increasing number of vehicles. This Paper presents an intelligent sign board
classification method based on blob analysis in traffic surveillance. Processing is done by three main
steps: moving object segmentation, blob analysis, and classifying. A Sign board is modelled as a
rectangular patch and classified via blob analysis. By processing the blob of sign boards, the meaningful
features are extracted. Tracking moving targets is achieved by comparing the extracted features with
training data. After classifying the sign boards the system will intimate to user in the form of alarms,
sound waves. The experimental results show that the proposed system can provide real-time and useful
information for traffic surveillance.
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...IAEME Publication
Multiple human tracking based on object detection has been a challenge due to its
complexity. Errors in object detection would be propagated to tracking errors. In this
paper, we propose a tracking method that minimizes the error produced by object
detector. We use RetinaNet as object detector and Hungarian algorithm for tracking.
The cost matrix for Hungarian algorithm is calculated using the RetinaNet features,
bounding box center distances, and intersection of unions of bounding boxes. We
interpolate the missing detections in the last step. The proposed method yield 43.2
MOTA for MOT16 benchmark
With these components in place, we present the Data
Science Machine — an automated system for generating
predictive models from raw data. It starts with a relational
database and automatically generates features to be used
for predictive modeling.
A Novel Approach for Moving Object Detection from Dynamic BackgroundIJERA Editor
In computer vision application, moving object detection is the key technology for intelligent video monitoring
system. Performance of an automated visual surveillance system considerably depends on its ability to detect
moving objects in thermodynamic environment. A subsequent action, such as tracking, analyzing the motion or
identifying objects, requires an accurate extraction of the foreground objects, making moving object detection a
crucial part of the system. The aim of this paper is to detect real moving objects from un-stationary background
regions (such as branches and leafs of a tree or a flag waving in the wind), limiting false negatives (objects
pixels that are not detected) as much as possible. In addition, it is assumed that the models of the target objects
and their motion are unknown, so as to achieve maximum application independence (i.e. algorithm works under
the non-prior training).
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsKalle
Although heat maps are commonly provided by eye-tracking and visualization tools, they have some disadvantages and caution must be taken when using them to draw conclusions on eye tracking results. It is motivated here that visual span is an essential component of visualizations of eye-tracking data and an algorithm is proposed to allow the analyst to set the visual span as a parameter prior to generation of a heat map.
Although the ideas are not novel, the algorithm also indicates how transparency of the heat map can be achieved and how the color gradient can be generated to represent the probability for an object to be observed within the defined visual span. The optional addition of contour lines provides a way to visualize separate intervals in the continuous color map.
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Kalle
Relevance feedback (RF) mechanisms are widely adopted in Content-Based Image Retrieval (CBIR) systems to improve image retrieval performance. However, there exist some intrinsic problems: (1) the semantic gap between high-level concepts and low-level features and (2) the subjectivity of human perception of visual contents. The primary focus of this paper is to evaluate the possibility of inferring the relevance of images based on eye movement data. In total, 882 images from 101 categories are viewed by 10 subjects to test the usefulness of implicit RF, where the relevance of each image is known beforehand. A set of measures based on fixations are thoroughly evaluated which include fixation duration, fixation count, and the number of revisits. Finally, the paper proposes a decision tree to predict the user’s input during the image searching tasks. The prediction precision of the decision tree is over 87%, which spreads light on a promising integration of natural eye movement into CBIR systems in the future.
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Kalle
The intuitive user interfaces of PCs and PDAs, such as pen display and touch panel, have become widely used in recent times. In this study, we have developed an eye-tracking pen display based on the stereo bright pupil technique. First, the bright pupil camera was developed by examining the arrangement of cameras and LEDs for pen display. Next, the gaze estimation method was proposed for the stereo bright pupil camera, which enables one point calibration. Then, the prototype of the eyetracking pen display was developed. The accuracy of the system was approximately 0.7° on average, which is sufficient for human interaction support. We also developed an eye-tracking tabletop as an application of the proposed stereo bright pupil technique.
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Kalle
Individuals with severe multiple disabilities have little or no opportunity to express their own wishes, make choices and move independently. Because of this, the objective of this work has been to develop a prototype for a gaze-driven device to manoeuvre powered wheelchairs or other moving platforms. The prototype has the same capabilities as a normal powered wheelchair, with two exceptions. Firstly, the prototype is controlled by eye movements instead of by a normal joystick. Secondly, the prototype is equipped with a sensor that stops all motion when the machine approaches an obstacle. The prototype has been evaluated in a preliminary clinical test with two users. Both users clearly communicated that they appreciated and had mastered the ability to control a powered wheelchair with their eye movements.
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Kalle
The visual field is the area of space that can be seen when an observer fixates a given point. Many visual capabilities vary with position in the visual field and many diseases result in changes in the visual field. With current technology, it is possible to build very complex real-time visual field simulations that employ gaze-contingent displays. Nevertheless, there are still no established techniques to evaluate such systems. We have developed a method to evaluate a system’s contingency by employing visual blind spot localization as well as foveal fixation. During the experiment, gaze-contingent and static conditions were compared. There was a strong correlation between predicted results and gaze-contingent trials. This evaluation method can also be used with patient populations and for the evaluation of gaze-contingent display systems, when there is need to evaluate a visual field outside of the foveal region.
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlKalle
Pie menus offer several features which are advantageous especially for gaze control. Although the optimal number of slices per pie
and of depth layers has already been established for manual control, these values may differ in gaze control due to differences in spatial accuracy and congitive processing. Therefore, we investigated the layout limits for hierarchical pie menu in gaze control. Our user study indicates that providing six slices in multiple depth layers guarantees fast and accurate selections. Moreover, we compared two different methods of selecting a slice. Novices performed well with both, but selecting via selection borders produced better performance for experts than the standard dwell time selection.
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Kalle
Eye typing could provide motor disabled people a reliable method of communication given that the text entry speed of current interfaces can be increased to allow for fluent communication. There are two reasons for the relatively slow text entry: dwell time selection requires waiting a certain time, and single character entry limits the maximum entry speed. We adopted a typing interface based on hierarchical pie menus, pEYEwrite [Urbina and Huckauf 2007] and included bigram text entry with one single pie iteration. Therefore, we introduced three different bigram building strategies.
Moreover, we combined dwell time selection with selection by borders, providing an alternative selection method and extra functionality. In a longitudinal study we compared participants performance during character-by-character text entry with bigram entry and with
text entry with bigrams derived by word prediction. Data showed large advantages of the new entry methods over single character text entry in speed and accuracy. Participants preferred selecting by
borders, which allowed them faster selections than the dwell time method.
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingKalle
The study of surgeons’ eye movements is an innovative way of assessing skill and situation awareness, in that a comparison of eye movement strategies between expert surgeons and novices may show differences that can be used in training. Our preliminary study compared eye movements of 4 experts and
4 novices performing a simulated gall bladder removal task on a
dummy patient with an audible heartbeat and simulated vital signs displayed on a secondary monitor. We used a head-mounted Locarna PT-Mini eyetracker to record fixation locations during the operation. The results showed that novices concentrated so hard on the surgical
display that they were hardly able to look at the patient’s vital signs, even when heart rate audibly changed during the procedure. In comparison, experts glanced occasionally at the vitals monitor, thus being able to observe the patient condition.
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
The portability of an eye tracking system encourages us to develop a technique for estimating 3D point-of-regard. Unlike conventional methods, which estimate the position in the 2D image coordinates of the mounted camera, such a technique can represent richer gaze information of the human moving in the larger area. In this paper, we propose a method for estimating the 3D point-of-regard and a visualization technique of gaze trajectories under natural head movements for the head-mounted device. We employ visual SLAM technique to estimate head configuration and extract environmental information. Even in cases where the head moves dynamically, the proposed method could obtain 3D point-of-regard. Additionally, gaze trajectories are appropriately overlaid on the scene camera image.
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeKalle
Recent advances in high magnification retinal imaging have allowed for visualization of individual retinal photoreceptors, but these systems also suffer from distortions due to fixational eye motion. Algorithms developed to remove these distortions have the added benefit of providing arc second level resolution of the eye movements that produce them. The system also allows for visualization of targets on the retina, allowing for absolute retinal position measures to the level of individual cones. This paper will describe the process used to remove the eye movement artifacts and present analysis of their spectral characteristics. We find a roughly 1/f amplitude spectrum similar to that reported by Findlay (1971) with no evidence for a distinct
tremor component.
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Kalle
Gaze visualizations represent an effective way for gaining fast insights into eye tracking data. Current approaches do not adequately support eye tracking studies for three-dimensional (3D) virtual environments. Hence, we propose a set of advanced gaze visualization techniques for supporting gaze behavior analysis in such environments. Similar to commonly used gaze visualizations for twodimensional
stimuli (e.g., images and websites), we contribute advanced 3D scan paths and 3D attentional maps. In addition, we introduce a models of interest timeline depicting viewed models, which can be used for displaying scan paths in a selected time segment. A prototype toolkit is also discussed which combines an implementation of our proposed techniques. Their potential for facilitating eye tracking studies in virtual environments was supported by a user study among eye tracking and visualization experts.
Skovsgaard Small Target Selection With Gaze AloneKalle
Accessing the smallest targets in mainstream interfaces using gaze
alone is difficult, but interface tools that effectively increase the size of selectable objects can help. In this paper, we propose a conceptual framework to organize existing tools and guide the development of new tools. We designed a discrete zoom tool and conducted a proof-of-concept experiment to test the potential of the framework and the tool. Our tool was as fast as and more accurate than the currently available two-step magnification tool. Our framework shows potential to guide the design, development, and testing of zoom tools to facilitate the accessibility of mainstream
interfaces for gaze users.
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerKalle
This paper presents a low-cost gaze tracking system that is based on a webcam mounted close to the user’s eye. The performance of the gaze tracker was evaluated in an eye-typing task using two different typing applications. Participants could type between 3.56 and 6.78 words per minute, depending on the typing system used. A pilot study to assess the usability of the system was also carried out in the home of a user with severe motor impairments. The
user successfully typed on a wall-projected interface using his eye movements.
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksKalle
Analysis of recordings made by a wearable eye tracker is complicated by video stream synchronization, pupil coordinate mapping, eye movement analysis, and tracking of dynamic Areas Of Interest (AOIs) within the scene. In this paper a semi-automatic system is developed to help automate these processes. Synchronization is accomplished
via side by side video playback control. A deformable eye template and calibration dot marker allow reliable initialization via simple drag and drop as well as a user-friendly way to correct the algorithm when it fails. Specifically, drift may be corrected by nudging the detected pupil center to the appropriate coordinates. In a case study, the impact of surrogate nature views on physiological health and perceived well-being is examined via analysis of gaze over images of nature. A match-moving methodology was developed to track AOIs for this particular application but is applicable toward similar future studies.
Rosengrant Gaze Scribing In Physics Problem SolvingKalle
Eye-tracking has been widely used for research purposes in fields such as linguistics and marketing. However, there are many possibilities of how eye-trackers could be used in other disciplines like physics. A part of physics education research deals with the differences between novices and experts, specifi-cally how each group solves problems. Though there has been a great deal of research about these differences there has been no research that focuses on noticing exactly where experts and no-vices look while solving the problems. Thus, to complement the past research, I have created a new technique called gaze scrib-ing. Subjects wear a head mounted eye-tracker while solving electrical circuit problems on a graphics monitor. I monitor both scan patterns of the subjects and combine that with videotapes of their work while solving the problems. This new technique has yielded new information and elaborated on previous studies.
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
In certain applications such as radiology and imagery analysis, it is important to minimize errors. In this paper we evaluate a structured inspection method that uses eye tracking information as a feedback mechanism to the image inspector. Our two-phase method starts with a free viewing phase during which gaze data is collected. During the next phase, we either segment the image, mask previously seen areas of the image, or combine the two techniques, and repeat the search. We compare the different methods
proposed for the second search phase by evaluating the inspection method using true positive and false negative rates, and subjective workload. Results show that gaze-blocked configurations reduced the subjective workload, and that gaze-blocking without segmentation showed the largest increase in true positive identifications and the largest decrease in false negative identifications of previously unseen objects.
Prats Interpretation Of Geometric Shapes An Eye Movement StudyKalle
This paper describes a study that seeks to explore the correlation between eye movements and the interpretation of geometric shapes. This study is intended to inform the development of an eye tracking interface for computational tools to support and enhance the natural interaction required in creative design. A common criticism of computational design tools is that they do not enable manipulation of designed shapes according to all perceived features. Instead the manipulations afforded are limited by formal structures of shapes. This research examines the potential for eye movement data to be used to recognise and make available for manipulation the perceived features in shapes. The objective of this study was to analyse eye movement data with the intention of recognising moments in which an interpretation of shape is made. Results suggest that fixation duration and saccade amplitude prove to be consistent indicators of shape interpretation.
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Kalle
Eye gaze interaction for disabled people is often dealt with by designing ad-hoc interfaces, in which the big size of their elements compensates for both the inaccuracy of eye trackers and the instability of the human eye. Unless solutions for reliable eye cursor control are employed, gaze pointing in ordinary graphical operating environments is a very difficult task. In this paper we present an eye-driven cursor for MS Windows which behaves differently according to the “context”. When the user’s gaze is perceived within the desktop or a folder, the cursor can be discretely shifted from one icon to another. Within an application window or where there are no icons, on the contrary, the cursor can be continuously and precisely moved. Shifts in the four directions (up, down, left, right) occur through dedicated buttons. To increase user awareness of the currently pointed spot on the screen while continuously moving the cursor, a replica of the spot is provided within the active direction button, resulting in improved pointing performance.
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Kalle
Laboratory eyetrackers, constrained to a fixed display and static (or accurately tracked) observer, facilitate automated analysis of fixation data. Development of wearable eyetrackers has extended environments and tasks that can be studied at the expense of automated analysis. Wearable eyetrackers provide 2D point-of-regard (POR) in scene-camera coordinates, but the researcher is typically interested in some high-level semantic property (e.g., object identity, region, or material) surrounding individual fixation points. The synthesis of POR into fixations and semantic information remains a labor-intensive manual task, limiting the application of wearable eyetracking.
We describe a system that segments POR videos into fixations and allows users to train a database-driven, object-recognition system. A correctly trained library results in a very accurate and semi-automated translation of raw POR data into a sequence of objects, regions or materials.
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Kalle
The purpose of this study is to explore how the viewers’ previous training is related to their aesthetic viewing in various interactions with the form and the context, in relation to apparel design. Berlyne’s two types of exploratory behavior, diversive and specific, provided a theoretical framework to this study. Twenty female subjects (mean age=21, SD=1.089) participated. Twenty model images, posed by a male and a female model, were shown on an eye-tracker screen for 10 seconds each. The findings of this study verified Berlyne’s concepts of visual exploration. One of the different findings from Berlyne’s theory was that the untrained viewers’ visual attention tended to be more significantly focused on peripheral areas of visual interest, compared to the trained viewers, while there was no significant difference on the central, foremost areas of visual interest between the two groups. The overall aesthetic viewing patterns were also identified.
2. gaze patterns [Hardoon et al. 2007]. The same principle has been
used for image retrieval as well [Klami et al. 2008], recently also
coupled dynamically to a retrieval engine in an interactive zooming
interface [Kozma et al. 2009]. Gaze has additionally been used as
a means of proactive interaction, but not information retrieval, in a
desktop application by assigning a relevance function to the entities
on a synthetic 2D map [Qvarfordt and Zhai 2005].
To test the feasibility of the idea of relevance ranking from gaze in
dynamical real-world setups, we prepared a stimulus video and col-
lected gaze data from subjects watching that video. True relevance
rankings were then asked from the subjects in several frames. We
trained an ordinal logistic regression model and measured its accu-
racy in the relevance prediction task on the left-out data.
2 Measurement Setup
We shot a video from the first-person view of a subject visiting three
indoor scenes. Then we postprocessed this video by augmenting
some of the objects with additional textual information in an at-
tached box. This video was shown to 4 subjects and gaze data was
collected. Right after the viewing session the subjects ranked the
scene objects in relevance order for a subset of the video frames.
The ranking was considered as the ground truth for learning the
models and evaluating them. The modelling task is to predict the
user-given ranking for an object given the gaze-tracking data from
a window immediately preceding the ranked frame.
3 Model for Inferring Relevance
Let us index the stimulus slices preceding each relevance judgement
from 1 to N. We extract a feature vector (details in the Experiments
section) for each scene object i at time slice t to obtain a single un-
labelled data point: fi
(t)
= {f
(t)
i1 , f
(t)
i2 , · · · , f
(t)
id } where d is the
number of features. If we also attach the ground truth relevance
ranking ri
(t)
, we get a labelled data point (fi
(t)
, ri
(t)
). Let us de-
note the set of data points, one for each object, related to time slice
t as a data subset Λ(t)
= {(f1
(t)
, r1
(t)
), · · · , (fmt
(1)
, rmt
(1)
)}
where mt is the number of visible objects at time slice t. Let
us denote the data subset without labels by Λ (t)
, and the maxi-
mum number of visible objects by L = max({m1, · · · , mN }).
For notational convenience, we define the most relevant object to
have rank L, and the rank decreases as relevance decreases. The
whole labelled data set consists of the union of all data subsets
∆ = {Λ(1)
, Λ(2)
, · · · , Λ(N)
}.
We search for a mapping from the feature space to the space of
relevances, which is conventionally [0, 1]. Such a mapping can di-
rectly be achieved using ordinal logistic regression [McCullagh and
Nelder 1989] if we assume that the relevance of an object depends
only on its features, and it is independent of the relevance of the
other visible objects. We use the standard approach as described
briefly below.
Let us denote the probability of the object rank to be k as P(ri
(t)
=
k | f
(t)
i ) = φk(f
(t)
i ). Then we can define the log odds such that the
problem reduces to a batch of L − 1 binary regression problems,
one for each k = 1, 2, · · · , L − 1:
Mk = log
P (ri
(t)
<=k | f(t)
i )
1−P (ri
(t)<= | f(t)
i )
= log
φ0(f(t)
i )+φ1(f(t)
i )+···+φk(f(t)
i )
φk+1(f(t)
i )+φk+2(f(t)
i )+···+φL(f(t)
i )
= w
(k)
0 + wf
(t)
i
where a linear model is assumed. By taking the exponent of both
sides we get the CDF of the rank distribution for object i at time t:
P(ri
(t)
<= k | f
(t)
i ) =
exp(w
(k)
0 + wf
(t)
i )
1 + exp(w
(k)
0 + wf
(t)
i )
.
Notice that we adopted the standard approach and used common
slope coefficients w = [w1, · · · , wd] for all logit models but differ-
ent intercepts w
(k)
0 . In the training phase, we calculate the maxi-
mum likelihood estimates for the parameters θ of this model (θ =
{w
(1)
0 , · · · , w
(k−1)
0 , w1, · · · , wd}) using the Newton-Raphson tech-
nique. Given an unlabelled data subset Λ (t)
at time t, the object
with relevance rank k is predicted to be the one that has the highest
probability for that rank; arg maxi φk(f
(t)
i ).
4 Experiments
4.1 Stimulus Preparation
We shot a video clip of 4 minutes and 17 seconds long from the first-
person view of a subject, using a see-through head mounted display
device. In the scenario of the clip, a visitor coming to our laboratory
is informed of our research project. The scenario consists of three
consecutive scenes:
1. A short presentation in a meeting room: A researcher in-
troduces the project with a block diagram drawn on the white-
board (Figure 1) in a meeting room. People present are asking
questions. The visitor follows the presentation.
2. A walk in the lab corridor: The visitor walks through the
laboratory taking a look at posters on the wall, and zooms
into some of the name tags on office doors.
3. Demo of data collection devices: The host introduces how
eye tracking experiments are made. He demonstrates a mon-
itor with eye tracking capabilities and the head-mounted dis-
play device.
Next, we augmented the video by attaching information boxes to
objects; such as faces, the whiteboard, name tags, posters, and de-
vices related to the project. These were considered to be the objects
potentialls the most interesting to the visitor. Short snippets of tex-
tual information relevant to the objects were displayed inside the
boxes. At most one information box was attached to any one object
at a time. We displayed boxes for all visible objects. There were
from 0 to 4 objects in the scene at a time; average number of scene
objects was 2.017 with 1.36 standard deviation. The frame rate of
the postprocessed video was 12 fps.
4.2 Data Collection
We collected gaze data from 4 subjects while they were watching
the stimulus video to get as much information as they can about
the research project. After the viewing session, the subjects were
shown 154 screenshots from the video in temporal order, each of
which represent a 1.66 seconds slot (20 frames). The users were
asked to select the objects that were relevant to them at that mo-
ment, and also to rank the selected subset of objects according to
their relevance. We defined relevance as the interest in seeing aug-
mented information about an object in the scene at that particular
time. All subjects assured, after ranking, that they were able to
remember the correct ranks for almost all the frames. The sub-
jects were graduate and postgraduate researchers not working on
the project related to the study we present in this paper.
106
3. 4.3 The Eye Tracker
We collected the gaze data with a Tobii 1750 eye tracker with 50Hz
sample rate. The tracker has an infra-red stereo camera on a stan-
dard flat-screen monitor. The device performs tracking by detecting
the pupil centers and measuring the reflection from the cornea. The
successive gazes that were located within an area of 30 pixels are
considered as a single fixation. This corresponds to approximately
0.6 degrees of deflection at a normal viewing distance to an 17”-
screen monitor with 1280 × 1024 pixel resolution. Test subjects
were sitting 60 cm away from the monitor.
4.4 Feature Extraction
We extracted from the gaze and video data a set of features cor-
responding to each visible object. This was done at every time
slice for which the labelled object ranks were available (i.e., for
one frame in every 20 consecutive frames). Each of these features
summarises a particular aspect in the temporal context (recent past).
We define the context at time t to be a slot from time point t − W
to t − 1 where W is a predetermined window size. We used the
following 11 features:
1. mean area of the bounding box of the object
2. mean area of the information box attached to the object
3. mean distance between the centers of the object bounding box and the attached
information box
4. total duration of fixations inside the bounding box of the object
5. total duration of fixations inside the information box attached to the object
6. mean duration of fixations inside the bounding box of the object
7. mean duration of fixations inside the information box attached to the object
8. mean distance of all fixations to the center of the object bounding box
9. mean distance of all fixations to the center of the information box
10. mean length of saccades that ended up with fixations inside the bounding box
of the object
11. mean length of saccades that ended up with fixations inside the information box
attached to the object
We marked the bounding boxes of the objects manually frame by
frame.
4.5 Evaluation
We evaluated the accuracy of the model with respect to the propor-
tion of times the most relevant object was predicted correctly. We
compared the model performance with five baseline methods. The
first one is random guessing, in which at each time slice, scene
objects are ranked uniformly at random. The second one is an
attention-based method that assigns a relevance proportional to the
total fixation duration on the object and on the augmented content.
This estimate of object relevance is referred to as gaze intensity
[Qvarfordt and Zhai 2005]. This is used to reveal the effect of in-
tricate gaze patterns, other than mere visual attention measured by
gaze intensity in relevance prediction. In the third baseline model
we used the ordinal logistic regression model with the features that
are not related to gaze: first three of the features. Thus we investi-
gated the effect of gaze-based features in prediction accuracy. We
defined two more baseline models that depend on Itti et al.’s bottom-
up visual attention model [Itti et al. 1998] in order to observe how
useful such plain attention modelling is in our problem setup, and
to test if our model provides better accuracy. We computed the Itti-
Koch saliency map of the labelled frames. Then we calculated the
relevance of an object as the maximum saliency inside its bounding
box for one baseline model, and as the average saliency inside the
bounding box for the other one.
We trained separate models for user specific and user independent
cases. In the user-specific case, we trained and tested the model on
the data of the same subject. We splitted the dataset into training
and validation sets by random selection without replacement. We
randomly selected 2/3 of the dataset for training and left out the
remainder for testing. We repeated this process 50 times and mea-
sured the mean prediction accuracy. We computed the accuracy for
several window sizes, starting from 50 frames and increasing un-
til 750 frames with 25-frame steps. Our model outperformed all the
other baseline methods for all subjects and all window sizes (Figure
2). The significance of the difference was tested for each subject
separately using Wilcoxon signed-rank method with α=0.05. We
made the test between our model and three best performing base-
lines; the logit model without gaze features and the two saliency
based models. We selected the window sizes for our model and the
logit model without gaze features with respect to average prediction
accuracy on the training data.
Figure 2: User-specific model accuracy for one user. Sub-images
show the accuracy (proportion of correct predictions) as a func-
tion of the context window size (in frames, x-axis). Red diamond:
our proposed model, blue circles: baseline model using only the
video features (not gaze), green reversed triangles: attention-only
model, cyan squares: random guessing, black triangles: maximum
saliency inside object, pink crosses: average saliency inside object.
In the user-independent case, we left out one user and trained the
model with the whole datasets of the other users. Then we evalu-
ated the accuracy on the data of the left out user. This procedure
was repeated for all users. The results gave the same conclusions
as in the user-specific case although with some decrease in the ac-
curacy for all the metrics and insignificance of outperformance for
some test subjects. This is probably due to the increase in the degree
of uncertainty originating from subjectivity of top-down cognitive
processes. Then a single common model may be inadequate to han-
dle the variability of gaze patterns across the subjects. This issue
needs to be investigated further.
The box plot in Figure 3 (a) shows the learned regressor weights
for a subject in the user-specific case. Small variance of weights
indicates that the model is stable across different splits. Both the
magnitude and the ordering of weights in the user-independent case
107
4. was very similar to the user-specific case.
The best accuracy is achieved at the longish window sizes (i.e.
525 frames in the user-specific case, and 300 frames in the user-
independent case for test subject 1). This supports the claim that
the context does contain information related to object relevances.
The decrease in accuracy as the window size further increases is
not very significant, and in particular the proposed model seems to
be insensitive to window size.
The feature that makes the highest positive influence on relevance is
the mean distance between the object center and the fixations within
the context (w8). Intuitively, the relevance of an object increases
as the fixations within the context get closer to the center of that
object. The feature that has the highest negative influence is the
mean distance between the object and the box. This means that
as the information box is placed closer to the object, it takes more
interest. Some of the weights are harder to interpret and we will
study them further in our subsequent research.
Figure 3: Variance of the regressor weights for each of the features
among different bootstrap trials in the user-specific model. The
features are nubmered in Section 4.4
5 Discussion
In this work, we assessed the feasibility of a gaze-based object
relevance predictor in real-world scenes where the scene objects
were augmented with additional information. For this, we applied
a rather simple ordinal logistic regression model over a set of gaze
pattern and visual content features. The prominent increase in ac-
curacy when the gaze pattern features are added to the feature set
reveals that gaze statistics and visual features make a mutually com-
plementary contribution to relevance inference. The optimal way of
combining these two sources of information should be further stud-
ied. The outperformance of our model over the bottom-up attention
model in predicting the most relevant object can be attributed to that
the bottom-up models are incapable of reflecting the task-dependent
control of attention.
A better performance can probably be achieved by enriching the
feature set and using a more complex model that better fits to the
data. Generalisation of the model for other real-world scenes also
needs to be investigated further. This can be done by plugging the
model into a wearable information access device and assessing its
performance during online use. Such assessment of our model is
currently under progress.
6 Acknowledgements
Melih Kandemir and Samuel Kaski belong to the Finnish Center
of Excellence in Adaptive Informatics and Helsinki Institute for In-
formation Technology (HIIT). Samuel Kaski also belongs to PAS-
CAL2 EU network of excellence. This study is funded by TKK
MIDE project UI-ART.
References
HARDOON, D., SHAWE-TAYLOR, J., AJANKI, A., PUOLAM ¨AKI,
K., AND KASKI, S. 2007. Information retrieval by inferring im-
plicit queries from eye movements. In International Conference
on Artificial Intelligence and Statistics (AISTATS ’07).
HENDERSON, J. M. 2003. Human gaze control during real-world
scene perception. Trends in Cognitive Sciences 7, 11, 498 – 504.
HYRSKYKARI, A., MAJARANTA, P., AALTONEN, A., AND
R ¨AIH ¨A, K.-J. 2000. Design issues of ’idict’: A gaze-assisted
translation aid. In Proceedings of ETRA 2000, Eye Tracking Re-
search and Applications Symposium, ACM Press, ACM Press,
9–14.
ITTI, L., KOCH, C., AND NIEBUR, E. 1998. A model of saliency-
based visual attention for rapid scene analysis. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 20, 11,
1254–1259.
KANDEMIR, M., SAARINEN, V.-M., AND KASKI, S. 2010. In-
ferring object relevance from gaze in dynamic scenes. In To Ap-
pear in Short Paper Proceedings of ETRA 2000, Eye Tracking
Research and Applications Symposium.
KLAMI, A., SAUNDERS, C., DE CAMPOS, T. E., AND KASKI, S.
2008. Can relevance of images be inferred from eye movements?
In MIR ’08: Proceeding of the 1st ACM international confer-
ence on Multimedia information retrieval, ACM, New York, NY,
USA, 134–140.
KOZMA, L., KLAMI, A., AND KASKI, S. 2009. GaZIR: Gaze-
based zooming interface for image retrieval. In Proc. ICMI-
MLMI 2009, The Eleventh International Conference on Multi-
modal Interfaces and The Sixth Workshop on Machine Learning
for Multimodal Interaction, ACM, New York, NY, USA, 305–
312.
MCCULLAGH, P., AND NELDER, J. 1989. Generalized Linear
Models. Chapman & Hall/CRC.
QVARFORDT, P., AND ZHAI, S. 2005. Conversing with the user
based on eye-gaze patterns. In CHI ’05: Proceedings of the
SIGCHI conference on Human factors in computing systems,
ACM, New York, NY, USA, 221–230.
TORRALBA, A., OLIVA, A., CASTELHANO, M. S., AND HEN-
DERSON, J. M. 2006. Contextual guidance of eye movements
and attention in real-world scenes: the role of global features in
object search. Psychological Review 113, 4, 766–786.
WARD, D. J., AND MACKAY, D. J. C. 2002. Fast hands-free
writing by gaze direction. Nature 418, 6900, 838.
ZHANG, L., TONG, M. H., MARKS, T. K., SHAN, H., AND COT-
TRELL, G. W. 2008. Sun: A bayesian framework for saliency
using natural statistics. Journal of Vision 8, 7 (12), 1–20.
108