This document discusses exploring different interaction modes for image retrieval. It describes developing a framework that allows multimodal interaction using techniques like eye tracking, voice recognition, and multi-touch. An experiment was conducted to compare the usability of different interaction methods for query by example image retrieval. Nine participants used four methods - anchor, gaze, mouse, and touch - to select regions in images. Metrics like accuracy, precision and time were measured. Preliminary results showed touch interaction had the most consistent performance and shortest completion times.
I have completed my Post graduate diploma program in design from National Institute of Design in Design for digital exp. I am keenly interested in projects related on user experience design, research methodology and development, product usability, service design and Interaction &interface design. I preferably would also like to do research and design process for innovative e- learning system, tools and techniques.
I am very passionate to read articles and materials on ancient civilization, culture, people and arts& crafts of different countries.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
Following the user’s interests in mobile context aware recommender systemsBouneffouf Djallel
The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). In this sense, Mobile Context-aware Recommender Systems (MCRS) suggest the user suitable information depending on her/his situation and interests. Two key questions have to be considered 1) how to recommend the user information that follows his/her interests evolution? 2) how to model the user’s situation and its related interests? To the best of our knowledge, no existing work proposing a MCRS tries to answer both questions as we do. This paper describes an ongoing work on the implementation of a MCRS based on the hybrid-ε-greedy algorithm we propose, which combines the standard ε-greedy algorithm and both content-based filtering and case-based reasoning techniques.
Performance Comparison of Digital Image Watermarking Techniques: A SurveyEditor IJCATR
Digital watermarking is the processing of combined information into a digital signal. A watermark is a secondary image,
which is overlaid on the host image, and provides a means of protecting the image. In order to provide high quality watermarked
image, the watermarked image should be imperceptible. This paper presents different techniques of digital image watermarking based
on spatial & frequency domain, which shows that spatial domain technique provides security & successful recovery of watermark
image and higher PSNR value compared to frequency domain.
I have completed my Post graduate diploma program in design from National Institute of Design in Design for digital exp. I am keenly interested in projects related on user experience design, research methodology and development, product usability, service design and Interaction &interface design. I preferably would also like to do research and design process for innovative e- learning system, tools and techniques.
I am very passionate to read articles and materials on ancient civilization, culture, people and arts& crafts of different countries.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
Following the user’s interests in mobile context aware recommender systemsBouneffouf Djallel
The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). In this sense, Mobile Context-aware Recommender Systems (MCRS) suggest the user suitable information depending on her/his situation and interests. Two key questions have to be considered 1) how to recommend the user information that follows his/her interests evolution? 2) how to model the user’s situation and its related interests? To the best of our knowledge, no existing work proposing a MCRS tries to answer both questions as we do. This paper describes an ongoing work on the implementation of a MCRS based on the hybrid-ε-greedy algorithm we propose, which combines the standard ε-greedy algorithm and both content-based filtering and case-based reasoning techniques.
Performance Comparison of Digital Image Watermarking Techniques: A SurveyEditor IJCATR
Digital watermarking is the processing of combined information into a digital signal. A watermark is a secondary image,
which is overlaid on the host image, and provides a means of protecting the image. In order to provide high quality watermarked
image, the watermarked image should be imperceptible. This paper presents different techniques of digital image watermarking based
on spatial & frequency domain, which shows that spatial domain technique provides security & successful recovery of watermark
image and higher PSNR value compared to frequency domain.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s profile, based on the interaction with the system, to render the system more adequate to its preferences.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for
retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification
resulting from semantic analysis of video. The obtained classes will be projected in the visualization space.
The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s
profile, based on the interaction with the system, to render the system more adequate to its preferences.
A Pattern Language for semi-automatic generation of Digital Animation through hand-drawn Storyboards
Pedro Henrique Braga*, UPM; Ismar Silveira, UPM
Presented at Workshop of Works in Progress at SIBGRAPI 2015 - Salvador, BA - Brazil
Kinnunen Towards Task Independent Person Authentication Using Eye Movement Si...Kalle
We propose a person authentication system using eye movement signals. In security scenarios, eye-tracking has earlier been used for gaze-based password entry. A few authors have also used physical features of eye movement signals for authentication in a taskdependent
scenario with matched training and test samples. We propose and implement a task-independent scenario whereby the training and test samples can be arbitrary. We use short-term eye gaze direction to construct feature vectors which are modeled using Gaussian mixtures. The results suggest that there are personspecific features in the eye movements that can be modeled in a task-independent manner. The range of possible applications extends
beyond the security-type of authentication to proactive and user-convenience systems.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
Computers help us handle and process tons of information data. Most of the time all this data is so dense, it’s almost impossible to understand from just looking at a bunch of numbers. Some of the data could be analyzed by computers, but most of the time there must be somebody, a real thinking person, who shall interpret the data and take conclusions from it to make decisions, analyze. Scientific Visualization is about converting numbers into a representation of reality, something more graphic so that a human being can understand and/or communicate.
A Framework for Human Action Detection via Extraction of Multimodal FeaturesCSCJournals
This work discusses the application of an Artificial Intelligence technique called data extraction and a process-based ontology in constructing experimental qualitative models for video retrieval and detection. We present a framework architecture that uses multimodality features as the knowledge representation scheme to model the behaviors of a number of human actions in the video scenes. The main focus of this paper placed on the design of two main components (model classifier and inference engine) for a tool abbreviated as VASD (Video Action Scene Detector) for retrieving and detecting human actions from video scenes. The discussion starts by presenting the workflow of the retrieving and detection process and the automated model classifier construction logic. We then move on to demonstrate how the constructed classifiers can be used with multimodality features for detecting human actions. Finally, behavioral explanation manifestation is discussed. The simulator is implemented in bilingual; Math Lab and C++ are at the backend supplying data and theories while Java handles all front-end GUI and action pattern updating. To compare the usefulness of the proposed framework, several experiments were conducted and the results were obtained by using visual features only (77.89% for precision; 72.10% for recall), audio features only (62.52% for precision; 48.93% for recall) and combined audiovisual (90.35% for precision; 90.65% for recall).
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesIJARIIT
The segmentation algorithms vary for the types of medical images such as MRI, CT, US, etc.The current study work
can further be extended to develop a GUI tool based approach for separating the ROI. Additionally, a new technique of
separating ROI form the original image that will be applicable for all type of medical images can be evolved. Separated ROI
can be stored with xmin, xmax, ymin and ymax value so that at the end of embedding process before transmitting watermarked
image, the segmented ROI can be attached with watermarked image. Any medical image watermarking approach will be
suitable, if we segment the ROI from medical image with the four values, then embedding of watermark can be done on whole
medical image, in this paper work on different scan like ctscan ,brain scan etc. our results significant high than other.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
Istance Designing Gaze Gestures For Gaming An Investigation Of PerformanceKalle
To enable people with motor impairments to use gaze control to play online games and take part in virtual communities, new interaction techniques are needed that overcome the limitations of dwell clicking on icons in the games interface. We have investigated gaze gestures as a means of achieving this. We report the results of an experiment with 24 participants that examined performance differences between different gestures. We were able to predict the effect on performance of the numbers of legs in the gesture and the primary direction of eye movement in a gesture. We also report the outcomes of user trials in which 12 experienced gamers used the gaze gesture interface to play World of Warcraft. All participants were able to move around and engage other characters in fighting episodes successfully. Gestures were good for issuing specific commands such as spell casting, and less good for continuous control of movement compared with other gaze interaction techniques we have developed.
Skovsgaard Small Target Selection With Gaze AloneKalle
Accessing the smallest targets in mainstream interfaces using gaze
alone is difficult, but interface tools that effectively increase the size of selectable objects can help. In this paper, we propose a conceptual framework to organize existing tools and guide the development of new tools. We designed a discrete zoom tool and conducted a proof-of-concept experiment to test the potential of the framework and the tool. Our tool was as fast as and more accurate than the currently available two-step magnification tool. Our framework shows potential to guide the design, development, and testing of zoom tools to facilitate the accessibility of mainstream
interfaces for gaze users.
Researcher Profiling based on Semantic Analysis in Social NetworksLaurens De Vocht
We propose a framework to address an important challenge in the context of the ongoing adoption of the “Web 2.0” in science and research, often referred to as “Research 2.0”. Microblogging is one of the trends with increasing leverage. The challenge in this thesis is to connect users of microblogging services such as Twitter based on specific common entities that are representative and truly matter to them. We investigated the possibilities of using social data for locating an expert who shares a very specific research topic. To enrich and verify this social data we link such content to existing open data provided by the online community. We are using semantic technologies (RDF ,SPARQL), com- mon ontologies (SIOC, FOAF, DublinCore, SWRC) and Linked Data (DBpedia, GeoNames, CoLinDa) to extract and mine the data about scientific conferences out of context of microblogs. We are identifying users related to each other based on entities such as topics (tags), events, time, locations and persons (mentions). As a proof-of-concept we explain, implement and evaluate such a researcher profiling use case. It involves the development of a framework that focuses on the proposition of researches based on topics and conferences they have in common. This framework provides an API that allows quick access to the analyzed information. A demonstration application: “Researcher Affinity Browser” shows how the API supports developers to build rich internet applications for Research 2.0. This application also intro- duces the concept “affinity” that exposes the implicit proximity between entities and users based on the content users produced. The usability of a demonstration application and the usefulness of the framework itself are investigated with an explicit evaluation question- naire. This user feedback lead to important conclusions about successful achievements and opportunities to further improve this effort.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s profile, based on the interaction with the system, to render the system more adequate to its preferences.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for
retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification
resulting from semantic analysis of video. The obtained classes will be projected in the visualization space.
The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s
profile, based on the interaction with the system, to render the system more adequate to its preferences.
A Pattern Language for semi-automatic generation of Digital Animation through hand-drawn Storyboards
Pedro Henrique Braga*, UPM; Ismar Silveira, UPM
Presented at Workshop of Works in Progress at SIBGRAPI 2015 - Salvador, BA - Brazil
Kinnunen Towards Task Independent Person Authentication Using Eye Movement Si...Kalle
We propose a person authentication system using eye movement signals. In security scenarios, eye-tracking has earlier been used for gaze-based password entry. A few authors have also used physical features of eye movement signals for authentication in a taskdependent
scenario with matched training and test samples. We propose and implement a task-independent scenario whereby the training and test samples can be arbitrary. We use short-term eye gaze direction to construct feature vectors which are modeled using Gaussian mixtures. The results suggest that there are personspecific features in the eye movements that can be modeled in a task-independent manner. The range of possible applications extends
beyond the security-type of authentication to proactive and user-convenience systems.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
Computers help us handle and process tons of information data. Most of the time all this data is so dense, it’s almost impossible to understand from just looking at a bunch of numbers. Some of the data could be analyzed by computers, but most of the time there must be somebody, a real thinking person, who shall interpret the data and take conclusions from it to make decisions, analyze. Scientific Visualization is about converting numbers into a representation of reality, something more graphic so that a human being can understand and/or communicate.
A Framework for Human Action Detection via Extraction of Multimodal FeaturesCSCJournals
This work discusses the application of an Artificial Intelligence technique called data extraction and a process-based ontology in constructing experimental qualitative models for video retrieval and detection. We present a framework architecture that uses multimodality features as the knowledge representation scheme to model the behaviors of a number of human actions in the video scenes. The main focus of this paper placed on the design of two main components (model classifier and inference engine) for a tool abbreviated as VASD (Video Action Scene Detector) for retrieving and detecting human actions from video scenes. The discussion starts by presenting the workflow of the retrieving and detection process and the automated model classifier construction logic. We then move on to demonstrate how the constructed classifiers can be used with multimodality features for detecting human actions. Finally, behavioral explanation manifestation is discussed. The simulator is implemented in bilingual; Math Lab and C++ are at the backend supplying data and theories while Java handles all front-end GUI and action pattern updating. To compare the usefulness of the proposed framework, several experiments were conducted and the results were obtained by using visual features only (77.89% for precision; 72.10% for recall), audio features only (62.52% for precision; 48.93% for recall) and combined audiovisual (90.35% for precision; 90.65% for recall).
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesIJARIIT
The segmentation algorithms vary for the types of medical images such as MRI, CT, US, etc.The current study work
can further be extended to develop a GUI tool based approach for separating the ROI. Additionally, a new technique of
separating ROI form the original image that will be applicable for all type of medical images can be evolved. Separated ROI
can be stored with xmin, xmax, ymin and ymax value so that at the end of embedding process before transmitting watermarked
image, the segmented ROI can be attached with watermarked image. Any medical image watermarking approach will be
suitable, if we segment the ROI from medical image with the four values, then embedding of watermark can be done on whole
medical image, in this paper work on different scan like ctscan ,brain scan etc. our results significant high than other.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
Istance Designing Gaze Gestures For Gaming An Investigation Of PerformanceKalle
To enable people with motor impairments to use gaze control to play online games and take part in virtual communities, new interaction techniques are needed that overcome the limitations of dwell clicking on icons in the games interface. We have investigated gaze gestures as a means of achieving this. We report the results of an experiment with 24 participants that examined performance differences between different gestures. We were able to predict the effect on performance of the numbers of legs in the gesture and the primary direction of eye movement in a gesture. We also report the outcomes of user trials in which 12 experienced gamers used the gaze gesture interface to play World of Warcraft. All participants were able to move around and engage other characters in fighting episodes successfully. Gestures were good for issuing specific commands such as spell casting, and less good for continuous control of movement compared with other gaze interaction techniques we have developed.
Skovsgaard Small Target Selection With Gaze AloneKalle
Accessing the smallest targets in mainstream interfaces using gaze
alone is difficult, but interface tools that effectively increase the size of selectable objects can help. In this paper, we propose a conceptual framework to organize existing tools and guide the development of new tools. We designed a discrete zoom tool and conducted a proof-of-concept experiment to test the potential of the framework and the tool. Our tool was as fast as and more accurate than the currently available two-step magnification tool. Our framework shows potential to guide the design, development, and testing of zoom tools to facilitate the accessibility of mainstream
interfaces for gaze users.
Researcher Profiling based on Semantic Analysis in Social NetworksLaurens De Vocht
We propose a framework to address an important challenge in the context of the ongoing adoption of the “Web 2.0” in science and research, often referred to as “Research 2.0”. Microblogging is one of the trends with increasing leverage. The challenge in this thesis is to connect users of microblogging services such as Twitter based on specific common entities that are representative and truly matter to them. We investigated the possibilities of using social data for locating an expert who shares a very specific research topic. To enrich and verify this social data we link such content to existing open data provided by the online community. We are using semantic technologies (RDF ,SPARQL), com- mon ontologies (SIOC, FOAF, DublinCore, SWRC) and Linked Data (DBpedia, GeoNames, CoLinDa) to extract and mine the data about scientific conferences out of context of microblogs. We are identifying users related to each other based on entities such as topics (tags), events, time, locations and persons (mentions). As a proof-of-concept we explain, implement and evaluate such a researcher profiling use case. It involves the development of a framework that focuses on the proposition of researches based on topics and conferences they have in common. This framework provides an API that allows quick access to the analyzed information. A demonstration application: “Researcher Affinity Browser” shows how the API supports developers to build rich internet applications for Research 2.0. This application also intro- duces the concept “affinity” that exposes the implicit proximity between entities and users based on the content users produced. The usability of a demonstration application and the usefulness of the framework itself are investigated with an explicit evaluation question- naire. This user feedback lead to important conclusions about successful achievements and opportunities to further improve this effort.
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinAnton Chuvakin
End-User Case Study: Five Best and Five Worst Practices for SIEM
Implementing SIEM sounds straightforward, but reality sometimes begs to differ. In this session, Dr.
Anton Chuvakin will share the five best and worst practices for implementing SIEM as part of security
monitoring and intelligence. Understanding how to avoid pitfalls and create a successful SIEM
implementation will help maximize security and compliance value, and avoid costly obstacles,
inefficiencies, and risks
Ultrasonic velocity and allied parameters of tetrahexylammonium iodidein bina...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Inspirational Storytelling (On Public Speaking)Montecarlo -
Slides of my speech at Imagine 2014: an attempt to bring some light and insights to the twelve "dreamers" participating in this year's event.
Dreamers: keep dreaming, but made your dreams come true!
ACA has created new reporting requirements under IRC Sections 6055 and 6056. As per the new reporting rules, certain employers are required to provide IRS with the information about their health plan coverage offerings to their employees.
The reporting is mandated to provide government with critical information for administering ACA mandates such as individual penalty as well as large employer shared responsibility penalties.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Using Evolutionary Prototypes To Formalize Product RequirementsArnold Rudorfer
Boundary objects are artifacts that facilitate
communication and interaction between people or groups
functioning in different domains. Software engineers, user
interface designers and usability specialists have different
domain knowledge, different terminologies, and shared
terms with different, distinct meanings. Boundary objects
can help assist the process of designing software by
providing a common interface for communication between
professionals in different domains. The Software
Engineering department and User Interface Design Center
at Siemens Corporate Research used an evolutionary
prototype as a boundary object to help elicit product
requirements from their client, Siemens Medical Solutions.
This enhanced communication with the client and between
groups at SCR. This paper describes how the evolutionary
prototype functioned as a boundary object and how it
allowed software engineering processes and humancomputer
interaction methods to proceed concurrently
without the need for well-defined interaction points.
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...Editor IJMTER
Web mining techniques are used to analyze the web page contents and usage details. Human facial
images are shared in the internet and tagged with additional information. Auto face annotation techniques are used
to annotate facial images automatically. Annotations are used in online photo search and management.
Classification techniques are used to assign the facial annotation. Supervised or semi-supervised machine learning
techniques are used to train the classification models. Facial images with labels are used in the training process.
Noisy and incomplete labels are referred as weak labels. Search-based face annotation (SBFA) is assigned by
mining weakly labeled facial images available on the World Wide Web (WWW). Unsupervised label refinement
(ULR) approach is used for refining the labels of web facial images with machine learning techniques. ULR
scheme is used to enhance the label quality using graph-based and low-rank learning approach. The training phase
is designed with facial image collection, facial feature extraction, feature indexing and label refinement learning
steps. Similar face retrieval and voting based face annotation tasks are carried out under the testing phase.
Clustering-Based Approximation (CBA) algorithm is applied to improve the scalability. Bisecting K-means
clustering based algorithm (BCBA) and divisive clustering based algorithm (DCBA) are used to group up the
facial images. Multi step Gradient Algorithm is used for label refinement process. The web face annotation scheme
is enhanced to improve the label quality with low refinement overhead. Noise reduction is method is integrated
with the label refinement process. Duplicate name removal process is integrated with the system. The indexing
scheme is enhanced with weight values for the labels. Social contextual information is used to manage the query
facial image relevancy issues.
Can “Feature” be used to Model the Changing Access Control Policies? IJORCS
Access control policies [ACPs] regulate the access to data and resources in information systems. These ACPs are framed from the functional requirements and the Organizational security & privacy policies. It was found to be beneficial, when the ACPs are included in the early phases of the software development leading to secure development of information systems. Many approaches are available for including the ACPs in requirements and design phase. They relied on UML artifacts, Aspects and also Feature for this purpose. But the earlier modeling approaches are limited in expressing the evolving ACPs due to organizational policy changes and business process modifications. In this paper, we analyze, whether “Feature”- defined as an increment in program functionality can be used as a modeling entity to represent the Evolving Access control requirements. We discuss the two prominent approaches that use Feature in modeling ACPs. Also we have a comparative analysis to find the suitability of Features in the context of changing ACPs. We conclude with our findings and provide directions for further research.
Faro Visual Attention For Implicit Relevance Feedback In A Content Based Imag...Kalle
In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images according to users’ eye gaze data. This represents a new mechanism for implicit relevance feedback, in fact usually the sources taken into account for image retrieval are based on the natural behavior of the user in his/her environment estimated by analyzing mouse and keyboard interactions. In detail, after the retrieval of the images by querying CBIRs with a keyword, our system computes the most salient regions (where users look with a greater interest) of the retrieved images by gathering data from an unobtrusive eye tracker, such as Tobii T60. According to the features, in terms of color, texture, of these relevant regions our system is able to re-rank the images, initially, retrieved by the CBIR. Performance evaluation, carried out on a set of 30 users by using Google Images and “pyramid” like keyword, shows that about the 87% of the users is more satisfied of the output images when the re-raking is applied.
Eye(I) Still Know! – An App for the Blind Built using Web and AIDr. Amarjeet Singh
This paper proposes eye(I) still know!, a voice control solution for the visually impaired people. The main purpose is even though the blind cannot see they can still know where to go and what to do! Nearby 60% of total blind population across the world is present in India. In a time where no one likes to rely on anyone, this is a small effort to make the blind independent individuals. This can be achieved using wireless communication, voice recognition and image scanning. The application with the use of object identification will priorly inform about the barriers in the path.
The software will use the camera of the device and scan all the obstacles with their corresponding distances from the user. This will be followed by audio instructions through audio output of the device.
This will efficiently direct the user through his/her way.
Similar to Engelman.2011.exploring interaction modes for image retrieval (20)
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Engelman.2011.exploring interaction modes for image retrieval
1. Exploring Interaction Modes for Image Retrieval
Corey Engelman1 Rui Li1 Jeff Pelz2 Pengcheng Shi1 Anne Haake1
ABSTRACT applications, where information about the images can be extracted
The number of digital images in use is growing at an increasing from experts and utilized. Major questions remain as to how best
rate across a wide array of application domains. That being said, to bring users “into the loop” [2,3].
there is an ever-growing need for innovative ways to help end- Multimodal user interfaces are promising as the interactive
users gain access to these images quickly and effectively. component of CBIR systems because different modes are best
Moreover, it is becoming increasingly more difficult to manually suited to expressing different kinds of information. Recent
annotate these images, for example with text labels, to generate research efforts have been focused on developing and studying
useful metadata. One such method for helping users gain access to usability for multimodal interaction [4,5,6]. Designing natural,
digital images is content-based image retrieval (CBIR). Practical usable interaction will require an understanding of which user
use of CBIR systems has been limited by several “gaps”, interactions should be explicit and which implicit. Consider query
including the well-known semantic gap and usability gaps [1]. by example (QBE), which requires users to select a representative
Innovative designs are needed to bring end users into the loop to image and often a region of that image. It is the usual paradigm in
bridge these gaps. Our human-centered approaches integrate CBIR but users have difficulty forming such queries. There is a
human perception and multimodal interaction to facilitate more need for innovative new methods to support QBE. Beyond QBE,
usable and effective image retrieval. Here we show that multi- more effective methods are needed for gaining input from the user
touch interaction is more usable than gaze based interaction for for relevance feedback to refine the results of a search. For
explicit image region selection. 1 example, this could be done explicitly, by actually having the user
directly specify which images were close to what they were
Categories and Subject Descriptors looking for, or implicitly by simply making note of which images
H.5.2 [Information Interfaces and Presentation]: User they looked at with interest (e.g via gaze). Finally, better
Interfaces – Graphical user interfaces, input devices and organization of the images returned from a query is as important
strategies, prototyping, user-centered design, voice I/O, as the underlying retrieval system itself, in that it allows the user
interaction styles. to quickly scan the results and find what they are looking for.
General Terms Our approach to overcoming the interactivity challenges of CBIR
Measurement, Performance, Design, Experimentation, Human is largely based on bringing the user into the process by
Factors combining traditional modes of input such as the keyboard and
mouse with interaction styles that may be more natural such as
Keywords gaze input (eye-tracking), voice recognition, and multi-touch
Multimodal, eye tracking, image retrieval, human-centered interaction. A software framework was developed for such a
computing system using existing graphical user interface (GUI) libraries and
then designing several subcomponents that allow for interaction
1. INTRODUCTION via the new methods within a GUI. With the implementation of
Research in CBIR has shown that image content is more this basic framework for multimodal interface design it is now
expressive of users’ perception than is textual annotation. A possible to quickly develop and test prototypes for different
semantic gap occurs, however, when low-level image features, interface layouts and even prototypes for different modes of
such as color or texture, are insufficient in completely interaction using one or more of the input modes (mouse,
representing an image in a way that reflects human perception. keyboard, gaze, voice, touch).
One possible way to bridge the semantic gap is to take a “human- A series of studies will be performed to determine which of these
centered” approach in system design. This is particularly prototypes are most efficient and usable across a range of image
important in knowledge rich domains, such as biomedical types and among varied end user groups. The first of these,
described here, involves study of modes of interaction for
1
B. Thomas Golisano College of Computing and Information Sciences, performing QBE through explicit region of interest selection. The
Rochester Institute of Technology main goal is to effectively compare the efficiency of different
1 Lomb Memorial Drive, Rochester, NY 14623-5603 interaction methods, as well as user preference, ease-of-use, and
{cde7825, rxl5604, spcast, arhics}@rit.edu ease-of-learning.
2
College of Imaging Arts and Sciences, Rochester Institute Technology 2. Methods
1 Lomb Memorial Drive, Rochester, NY 14623-5603
{jbppph }@rit.edu 2.1 Design And Implementation
The best approach to developing a multimodal user interface such
Permission to make digital or hard copies of all or part of this work for as the one described here is an evolutionary approach. This means
personal or classroom use is granted without fee provided that copies are breaking the overall large goal of building a multimodal user
not made or distributed for profit or commercial advantage and that copies interface into smaller obtainable goals, and designing,
bear this notice and the full citation on the first page. To copy otherwise, implementing, testing, and integrating these smaller portions. In
or republish, to post on servers or to redistribute to lists, requires prior this way, the developer can ensure that separate components are
specific permission and/or a fee. not dependent on one another, because one builds stand-alone
subsystems, and then integrates them.
NGCA '11, May 26-27 2011, Karlskrona, Sweden
Copyright 2011 ACM 978-1-4503-0680-5/11/05…$10.00.
2. 2.1.1 Eye Tracking window (JFrame) and the LayoutManager class for managing
The Sensomotoric Instruments (SMI) RED 250 Hz eye-tracking placement of components within the window. Furthermore, a
device, was used to track the position of the user’s gaze on the system for allowing rapid prototyping of UI layouts can be put in
monitor. SMI’s iViewX software was used to run the eye tracker place to facilitate development. This involves creating an Abstract
during use and SMI’s Experiment Center was used to perform a class called PrototypeUI that inherits from javas JFrame class.
calibration prior to use. Our custom software, written in Java, Any number of prototype UI layouts can be created and tested
communicates with the device using Unified Data Protocol (UDP) without changing the code for core functionality of the system or
to send signals to the eye-tracker to start and stop recording. Once for the previously mentioned subcomponents that are handling
the eye tracker receives the start signal, it begins streaming screen different modes of input.
coordinates to the program. A separate program thread can then 2.2 Experimental Design
repeatedly get the new coordinates and update respective variables
To evaluate prototype interaction styles for QBE, we recruited 9
corresponding to the users gaze. Because the human eye is
undergraduate and graduate students at Rochester Institute of
naturally jittery, it is necessary to implement an algorithm for
Technology as study participants. Participants were given an
smoothing/filtering the data coming from the eye tracker. Because
explanation of the CBIR paradigm and of QBE and then were
the system is developed in an Object Oriented Programming
given a brief tutorial on each prototype mode they would be using.
Language (OOP), implementing such functionality is as simple as
For the study they were shown a set of ten images, four separate
creating an abstract Filter class, and then creating several
times, in randomized order. Each of the four times they were
instances of that abstract Filter. This allows multiple different
shown the ten images, their task was to perform QBE by explicit
filtering algorithms to be created easily. Even this functionality
region of interest selection using one of the four prototype
affords a vast array of possibilities then for how the eye input data
methods of interaction. Because we are not concerned in this
can be used for interaction. For example, eye tracking could be
study about regions of interest within objects but rather whether
used to replace mouse/keyboard scrolling and panning [7].
the user can effectively select an object, we instructed the user to
2.1.2 Voice Recognition select a specific object from each image (e.g select the eight ball
Java defines the Java Speech Application Programming Interface from an image of billiard balls on a pool table; see Figure 1C).
(JSAPI), implemented by several open source libraries. Any
2.2.1 Image Selection
implementation of the JSAPI is a suitable choice as they all
When choosing the images to use for the study, there were two
perform the functionality specified by Java. For our system, we
main considerations. First, because we specified what to select,
chose Cloud Garden JSAPI (http://www.cloudgarden.com).
there was a requirement for obvious, discrete objects in the image
Beyond a suitable library that implements the JSAPI, a speech
to eliminate ambiguity. Next, we wanted to test our four
recognition engine is required on the computer running the
prototypes across a variety of images and so we defined categories
multimodal system. For our system, we have used Windows
of images. These categories; simple, intermediate, and complex,
Speech Recognition, because it is included in the Windows
were based on the complexity of the object the user was to select.
operating system (Windows 7). A custom “grammar” can be
For the simple category, we photographed billiard balls in
written to specify which commands the system will accept. Then a
different configurations. This covers both criteria, because the
simple controller can be implemented to receive commands,
shape is simply a circle, and it allows us to instruct the user to
interpret them, and pass them on to the proper event handler.
select the eight ball. For the intermediate category, we used dice.
Voice recognition has the potential to greatly increase the
This allowed us to construct a number of intermediate complexity
efficiency of interaction between system and user. Furthermore, it
shapes. We considered them to be intermediate, because the edges
is simple to include basic functions such as a speech lock, so that
were always straight and in a 2D image, the shapes formed by the
the user can easily turn on/off voice recognition.
dice are essentially polygons. Finally, for the complex images, we
2.1.3 Multi-Touch Interaction chose to use images of horses. This is obviously a more complex
For multi-touch, an open source library called MT4J shape than the previous example, and it still allows for easy
(http://www.mt4j.org) was used. This library allows the Windows instruction of what to select, because each of the images contained
7 touch screen commands to be used within a Java application. a brown pony and a larger whitish/greyish horse.
From here, it is possible to implement custom gesture processors,
2.2.2 Prototype Interaction Methods
or use a number of predefined processors. Touch interaction can
be applied to QBE, and a number of other interactions with the 2.2.2.1 The Anchor Method
user. Beyond this, the library allows creation of custom multi- The anchor method combines interaction styles of gaze, voice and
touch user interface components. Another benefit is that it is either the mouse or touch screen. The user looks at the center of
simple to create stand-alone multi-touch applications and then the object they want to select, then says the command “set
embed them in the system. This follows the previously mentioned anchor”. This places a small selection circle on screen where the
evolutionary prototyping engineering methodology, because it user was looking. Next to this selection circle is a slider object
easily allows simple standalone prototypes to be developed, then which can slide left to decrease the radius of the selection circle,
integrated into the existing system. For our experiment, a Dell or right to increase the radius of the selection circle. The slider
SX2210T Touch Screen Monitor was used can be adjusted using either mouse or touch, depending on the
user’s preference.
2.1.4 Traditional GUI Components
Because the subcomponents of the multimodal user interface were 2.2.2.2 Gaze Interaction
developed in Java, the Swing GUI libraries can be used to create Unlike the anchor method, this method uses eye tracking almost
traditional visual components and handle input from the mouse exclusively. The user finds the object to select, then clicks a
and keyboard. This also makes developing the basic framework button using either mouse or touch screen to begin eye tracking.
for the user interface (i.e windowing and layout structure) very Once turned on, the program begins painting over the area to
simple, because Java’s Swing library includes classes for a UI provide feedback, as the user glances over the object. When
3. finished, the user presses the same button to stop the eye tracker. participants missed, a measure of precision by showing excess
Alternatively, eye tracking can be started by saying the command selection as the percentage of the users total selection that was not
“start eye tracking” and stopped by saying, “stop eye tracking”. the object, and a measure of efficiency by showing the time to
While painting, saccades are not drawn; rather fixations are complete the image.
visualized by placing translucent circles on the screen. The radius
of the circle is determined by the fixation duration (i.e a longer 3.2 Efficiency of Interaction Methods
fixation duration means a larger radius). Descriptive statistical analysis of the data was performed to
determine efficiency of the different prototypes in terms of
2.2.2.3 Mouse Selection accuracy, precision, and time to complete. Box plots were
For this method, the user finds the object of interest and then constructed to show the comparison of the different prototypes.
presses and holds the mouse button to begin drawing a selection
window. The selection auto-completes by always drawing a
straight line from the point of the initial click to where the mouse
is currently located. When the user finishes their selection, they
simply release the mouse button.
2.2.2.4 Touch Selection
This method works similarly to mouse selection except that rather
than pointing and clicking with the mouse, the user traces the
object with a finger to form the selection window. The window
auto-completes in the same fashion as for mouse selection.
Figure 2.a
Figure 1. From left to right, images from the intermediate,
complex and simple categories. The first is a selection made
using the touch screen. The second uses gaze interaction, and
the third uses the anchor method
2.2.3 Metrics
To evaluate the usability attributes of efficiency and usefulness
for each style of interaction we defined several metrics. Accuracy
was measured by calculating the area of the object in the image
(in pixels) prior to selection using the GNU Image Manipulation Figure 2.b
Program (GIMP), then calculating the area of the object in a given
selection. To determine the percentage of the object the user
missed. Precision was determined by calculating how much of the
users selection was outside of the object. The amount of excess
selection (in pixels), was divided by the total selection (in pixels)
to calculate a relative excess value of the user’s selection.
Efficiency of the different modes was determined by measuring
the time (in seconds) to complete a selection. We also asked the
users to rate each of the prototypes in three categories on a scale
from one to five. The categories were ease-of-use, ease-of-
learning, and how natural the method felt. Also, we counted the
number of times the user had to use the undo function. These
measurements show more the usability of a prototype rather than Figure 2.c
its efficiency and accuracy. Figures 2.a-2.c show comparison of box plots of the data
3. Data Analysis collected from the nine participants on all four interaction
methods for one of the images of horses. 2.a shows the
3.1 Data Collection percentage of the selection that was excess, 2.b shows the
Camtasia Studio (TechSmith) was used to record the screen percentage of the object missed by the user, and 2.c shows the
during the study. Data were extracted from the images captured time taken to complete the selection
from the video. These images showed the participants selections In all three of the plots above, the touch screen method has the
for each of the ten images four separate times (one for each most consistent results (smaller size of the box). The touch screen
method). The data extracted included, the area (in pixels) that they also has the lowest median value for percentage of the selection
selected within the object, and the area that was excess selection. missed and time taken to complete. For percentage of excess
Again, the values were measured using GIMP. Viewing of the selection, the mouse has the lowest median, but the touch screen
data suggested that the best way to effectively show the still had a more consistent set of values in which the bulk of the
comparison of the four prototypes would be to show a measure of values were lower than those from the mouse.
accuracy by displaying the percentage of the object that the
4. Table 1. The table below shows the average values of excess requires the user to coordinate between their hand and eye without
selection, percentage of the object missed, and time taken for the hand being in their field of vision. Furthermore, the average
all four prototype methods. user prefers to use a mouse or touch screen for this type of task.
Anchor Method Touch Mouse Gaze 4.1.3 Individual Differences
Excess 48.4% 17.7% 17.1% 49.4% Finally, our study metrics show that interaction with the mouse
Missed 9.0% 4.7% 9.8% 7.6% and touch screen is generally consistent across participants,
whereas there is greater variability with eye tracking, This
Time (s) 17.6 13.9 16.3 20.8
probably occurs because using one’s eyes to select or trace
something is not natural, and so while some people may learn the
3.3 User Preference method very quickly, others will not.
Table 2. The table below shows the average values of user 4.1.4 Future Studies
preference (scale of one to five) and the average undo usage Studies are ongoing to prototype and test additional interaction
for all four prototypes styles which may be useful for image retrieval. For example, a
study to show the efficiency of different modes in a search related
Anchor Touch Mouse Gaze
task, like scrolling, selection of an entire image from a set, or
Ease-of-Use 2.9 4.5 4.7 3.3 using gestures, see [10], would be useful. This would be
Ease-of-Learning 3.5 4.8 4.4 3.8 interesting to see, because it might be the case that in these types
of tasks, mouse and touch screen are not the most efficient. We
Natural 2.6 4.7 4 2.4 are also engaged in using gaze for implicit interaction, such as in
Undo Usage 8 1 1 1 [5,9], towards our long-term goals of creating adaptive,
multimodal systems for image retrieval.
The table above clearly shows that the mouse and touch screen
received higher ratings than the two methods using eye tracking. 5. ACKNOWLEDGMENTS
In general, the users were in agreement about the different This work is supported by NSF grant IIS-0941452. Any opinions,
prototypes, with the standard deviation on average being below findings, conclusions, or recommendations expressed in this
one (SD ≈ .86). Undo usage was fairly low with the average user material are those of the authors and do not necessarily reflect the
pressing undo just once per 10 images when using touch, mouse, views of the NSF.
or gaze. However, the Anchor method had a significantly higher
undo usage. Furthermore, the variance with undo usage for the 6. REFERENCES
anchor method is relatively high (SD ≈ 10.2). This variance is [1] Deserno TM, Antani S, Long R. Ontology of gaps in
likely caused by a combination of the high learning curve that this content-based image retrieval. J Digit Imaging.2009
method has. It requires the user to coordinate use of three input Apr;22(2):202-15. Epub 2008 Feb 1.
methods. Furthermore, the inaccuracy of the eye tracker, plus or
[2] Lew S.L., Sebe N., Lifl D. C., and J. Ramesh. Content-based
minus two visual degrees, plays a more significant factor here,
multimedia information retrieval: State of the art and
because unlike the gaze method where the user can see where they
challenges. ACM Transactions on Multimedia Computing,
are painting, and adjust their eyes, in this method if the tracker is
Communications and Applications, 2(1): 1-19, 2006.
off, then the user only sees this after the anchor is placed. Then
the user must click undo. [3] Müller H, Michoux N, Bandon D, A. Geissbuhler. A review
of CBIR systems in medical applications-clinical benefits
4. Conclusions and future directions. Int J Med Inform., 73(1):1-23, 2004.
4.1.1 Eye Tracking Interaction Methods [4] Qvarfordt P. and Zhai S. Conversing with the User Based on
This study shows clearly that using eye tracking for explicit user Eye-Gaze Patterns. Proc. CHI (2005), ACM, 221-230.
interaction in a task that requires the user to be precise and [5] Sadeghi M., TienG., Hamarneh G., and Atkins A. . Hands-
accurate is not effective. This is not surprising since people have free Interactive Image Segmentation Using Eyegaze. In SPIE
difficulty with smooth pursuit, that might be required for drawing Medical Imaging, 2009.
or tracing activities, when objects are stationary [8] This, in
combination with some inaccuracy of the eye tracker, does not [6] Ren, J., Zhao, R., Feng, D.D., and Siu, W. Multimodal
allow enough accuracy using interaction styles implemented for Interface Techniques in Content-Based Multimedia
this study. It is more likely that implicit interaction i.e. selection Retrieval. In Proceedings of ICMI. 2000, 634-641.
based on more natural gaze behavior as a user is browsing or [7] Kumar, M., and Winograd, T. Gaze-enhanced Scrolling
examining an image, such as in [5,9], will be effective for QBE. Techniques, UIST: Symposium on User Interface Software
and Technology. New Port, RI. 2007
4.1.2 Touch Screen and Mouse Interaction Methods
For the user group studied here, touch screen and mouse show [8] Krauzlis, RJ. The control of voluntary eye movements: new
similar results for a task such as tracing/selecting. The general perspectives. The Neuroscientist. 2005 Apr;11(2):124-37.
case is that touch screen is slightly more efficient than the mouse. PMID 15746381.
However, when we consider the images from the category of [9] Santella, A., Agrawala, M., DeCarlo D., Saleshin, D., Cohen,
complexly shaped images, it is apparent that the trend does not M., Gaze-Based Interaction for Semi-Automatic Photo
apply. The touch screen is more efficient than the mouse. This is Cropping. CHI proceedings: Collecting and Editing Photos,
likely caused by the fact that the touch screen is more natural than 2006
mouse even for technically-savvy, college-age participants
[10] Heikkilä, H., Räihä, K-J. Speed and Accuracy of Gaze
because it is closer to the human’s natural interaction process. In
Gestures, Journal of Eye Movement Research. 2009
contrast, the mouse somewhat mimics a natural interaction, but