This document is the thesis submitted by Trystan Upstill for the degree of Bachelor of Information Technology (Honours) at the Australian National University in November 2000. It discusses the development of a new approach to image retrieval on the World Wide Web that addresses consistency, clarity, and control. The thesis surveys existing image retrieval techniques, identifies problems in current WWW image search systems, and then presents a novel WWW image retrieval approach called VISR that incorporates client-side image analysis, visualization of results, and an interactive interface. Experiments show that VISR offers improvements over other systems by providing more consistent analysis and visualization, clearer explanations for returned image matches, and greater user control through more expressive queries and enhanced interaction capabilities.
This document provides an overview of IBM Watson Content Analytics and describes how it can be used to gain insights from unstructured content. It discusses the product's history and key features in version 3.0. Some main capabilities include performing automated content analysis, discovering patterns and correlations in data, and gaining insights to improve products and services. The document also provides examples of how Content Analytics has been applied in various use cases, such as customer service, healthcare, and investigations.
This document provides an overview of IBM Watson Content Analytics and how it can be used to gain insights from unstructured content. It discusses the architecture of Content Analytics, which includes ingesting and processing unstructured data using natural language processing techniques. It then provides several use case examples where Content Analytics has been applied, such as for customer insights, healthcare, and investigations. The document also covers best practices for designing Content Analytics solutions and understanding the types of analysis that can be performed.
A Bilevel Optimization Approach to Machine Learningbutest
This document is a doctoral thesis submitted by Gautam Kunapuli to Rensselaer Polytechnic Institute in partial fulfillment of the requirements for a Doctor of Philosophy in Mathematics. The thesis proposes a new bilevel optimization approach for solving machine learning problems involving model selection. It formulates machine learning problems as bilevel programs, which are transformed into mathematical programs with equilibrium constraints. It then develops two algorithms for optimizing these problems: one uses nonlinear programming to relax the constraints, while the other uses successive linearization. The thesis applies these approaches to problems in support vector classification and regression, and tests them on synthetic and real-world datasets.
This doctoral thesis examines methods for estimating the authenticity of videos by analyzing their visual quality and structure. It aims to determine the proportion of information an edited video retains from its original parent video. The thesis first evaluates existing no-reference algorithms for visual quality assessment. It then explores techniques for shot segmentation and comparison. It also develops models for calculating a video's authenticity degree based on factors like visual quality, shot importance, and evidence of global modifications. The goal is to objectively estimate a video's authenticity when only the video itself is available for analysis, without relying on external metadata.
The document is a thesis submitted by Maliththa S. S. Bulathwela for the degree of Master of Science in Computational Statistics and Machine Learning at University College London. The thesis explores building a self-adaptive topic engine to extract insights from customer feedback data. Initial work uses supervised support vector machines for topic classification and adapts trust modeling techniques to enhance the reliability of crowd-sourced labeled data. Latent Dirichlet allocation is then used to detect emerging topics from unlabeled data. The results were promising, suggesting further work could build self-adapting topic engines using techniques from the thesis.
This thesis investigates using interactive genetic algorithms to derive intelligent behavior for agents in a smart grid system. The thesis tests different variations of genetic algorithms in repeated matrix games against other learning algorithms like GIGA-WoLF and Q-learning. The results show the potential of using genetic algorithms, particularly when incorporating effective human input, to develop adaptive agent strategies in dynamic multi-agent environments like smart grids.
This document is a dissertation submitted by Aniket Pingley to The George Washington University in partial fulfillment of the requirements for the degree of Doctor of Science in August 2011. The dissertation addresses privacy issues related to location-based services and introduces four client-centric privacy protection systems - CAP, BACK-TRACK, DUMMY-Q, and Digital Marauder's Map. For each system, the dissertation presents theoretical analysis and experimental evaluation to demonstrate the effectiveness of the proposed techniques on privacy protection and efficiency.
This document provides an overview and summary of a thesis on visualizing uncertainty in fiber tracking based on diffusion tensor imaging (DTI). The thesis addresses challenges with visualizing uncertainty throughout the DTI and fiber tracking pipeline, including image acquisition, diffusion modeling, fiber tracking, and visualization. It proposes and evaluates various techniques for visualizing different types of uncertainty, such as value uncertainty, location uncertainty, and parameter uncertainty. The visualization techniques are applied to fiber tracking results to aid in neurosurgical planning and other medical applications.
This document provides an overview of IBM Watson Content Analytics and describes how it can be used to gain insights from unstructured content. It discusses the product's history and key features in version 3.0. Some main capabilities include performing automated content analysis, discovering patterns and correlations in data, and gaining insights to improve products and services. The document also provides examples of how Content Analytics has been applied in various use cases, such as customer service, healthcare, and investigations.
This document provides an overview of IBM Watson Content Analytics and how it can be used to gain insights from unstructured content. It discusses the architecture of Content Analytics, which includes ingesting and processing unstructured data using natural language processing techniques. It then provides several use case examples where Content Analytics has been applied, such as for customer insights, healthcare, and investigations. The document also covers best practices for designing Content Analytics solutions and understanding the types of analysis that can be performed.
A Bilevel Optimization Approach to Machine Learningbutest
This document is a doctoral thesis submitted by Gautam Kunapuli to Rensselaer Polytechnic Institute in partial fulfillment of the requirements for a Doctor of Philosophy in Mathematics. The thesis proposes a new bilevel optimization approach for solving machine learning problems involving model selection. It formulates machine learning problems as bilevel programs, which are transformed into mathematical programs with equilibrium constraints. It then develops two algorithms for optimizing these problems: one uses nonlinear programming to relax the constraints, while the other uses successive linearization. The thesis applies these approaches to problems in support vector classification and regression, and tests them on synthetic and real-world datasets.
This doctoral thesis examines methods for estimating the authenticity of videos by analyzing their visual quality and structure. It aims to determine the proportion of information an edited video retains from its original parent video. The thesis first evaluates existing no-reference algorithms for visual quality assessment. It then explores techniques for shot segmentation and comparison. It also develops models for calculating a video's authenticity degree based on factors like visual quality, shot importance, and evidence of global modifications. The goal is to objectively estimate a video's authenticity when only the video itself is available for analysis, without relying on external metadata.
The document is a thesis submitted by Maliththa S. S. Bulathwela for the degree of Master of Science in Computational Statistics and Machine Learning at University College London. The thesis explores building a self-adaptive topic engine to extract insights from customer feedback data. Initial work uses supervised support vector machines for topic classification and adapts trust modeling techniques to enhance the reliability of crowd-sourced labeled data. Latent Dirichlet allocation is then used to detect emerging topics from unlabeled data. The results were promising, suggesting further work could build self-adapting topic engines using techniques from the thesis.
This thesis investigates using interactive genetic algorithms to derive intelligent behavior for agents in a smart grid system. The thesis tests different variations of genetic algorithms in repeated matrix games against other learning algorithms like GIGA-WoLF and Q-learning. The results show the potential of using genetic algorithms, particularly when incorporating effective human input, to develop adaptive agent strategies in dynamic multi-agent environments like smart grids.
This document is a dissertation submitted by Aniket Pingley to The George Washington University in partial fulfillment of the requirements for the degree of Doctor of Science in August 2011. The dissertation addresses privacy issues related to location-based services and introduces four client-centric privacy protection systems - CAP, BACK-TRACK, DUMMY-Q, and Digital Marauder's Map. For each system, the dissertation presents theoretical analysis and experimental evaluation to demonstrate the effectiveness of the proposed techniques on privacy protection and efficiency.
This document provides an overview and summary of a thesis on visualizing uncertainty in fiber tracking based on diffusion tensor imaging (DTI). The thesis addresses challenges with visualizing uncertainty throughout the DTI and fiber tracking pipeline, including image acquisition, diffusion modeling, fiber tracking, and visualization. It proposes and evaluates various techniques for visualizing different types of uncertainty, such as value uncertainty, location uncertainty, and parameter uncertainty. The visualization techniques are applied to fiber tracking results to aid in neurosurgical planning and other medical applications.
This document describes a neural network based content-based image retrieval system developed as a final year project. The system uses Haar wavelet transform and RGB and RgYb colour channels as features to train neural networks on the COREL 1k dataset. The trained neural networks are used to retrieve similar images from the dataset and external images based on calculating distance between feature vectors. Experiments were conducted to evaluate the performance of the proposed system and neural network architecture. The system provides a graphical user interface for users to submit queries and obtain returned images.
Masters Thesis: A reuse repository with automated synonym support and cluster...Laust Rud Jacobsen
Having a code reuse repository available can be a great asset for a programmer. But locating components can be difficult if only static documentation is available, due to vocabulary mismatch. Identifying informal synonyms used in documentation can help alleviate this mismatch. The cost of creating a reuse support system is usually fairly high, as much manual effort goes into its construction.
This project has resulted in a fully functional reuse support sys- tem with clustering of search results. By automating the construc- tion of a reuse support system from an existing code reuse repository, and giving the end user a familiar interface, the reuse support system constructed in this project makes the desired functionality available. The constructed system has an easy to use interface, due to a fa- miliar browser-based front-end. An automated method called LSI is used to handle synonyms, and to some degree polysemous words in indexed components.
In the course of this project, the reuse support system has been tested using components from two sources, the retrieval performance measured, and found acceptable. Clustering usability is evaluated and clusters are found to be generally helpful, even though some fine-tuning still has to be done.
This thesis examines the potential for immersive virtual reality (VR) to influence consumer behavior in destination marketing. Through two studies - a lab experiment and a field experiment - the authors investigate the effect of immersive VR versus 2D pictures on consumer outcomes like destination attitude and behavioral intentions. The results show that immersive VR does not have a significant total effect on consumer outcomes. However, mediation analyses reveal indirect effects through factors like telepresence, enjoyment, mental imagery, and predicted emotions/experiences. Perceived picture quality is also found to moderate some relationships. The findings suggest immersive VR content can impact destination attitude and purchase intentions when developed to stimulate important factors and used with high-quality equipment.
This document proposes a system to allow a robot to automatically find a path to a predefined goal in uncontrolled environments. The system has three main modules: 1) An artificial vision module that obtains a quantified representation of the robot's vision using local feature detection and visual words. 2) A reinforcement learning module that receives the vision input and sensor data to compute the state and reward. The state is a normalized vector and sensor data, and reward is based on distance to the goal. 3) A behavior control module. The robot is tested using Sony Aibo to seek the goal and change behavior based on experience, but does not find the optimal route.
This document is the thesis of Arnaud Jean-Baptiste presented at the Universite des Sciences et Technologies de Lille for the degree of Doctor of Philosophy in computer science. The thesis proposes a model of handles to control references in dynamically typed languages by enforcing behavioral properties like read-only at the reference level. It presents three experiments with handles - enforcing read-only, supporting various behavioral properties, and adding state to handles. The thesis also discusses implementation details and evaluates the performance overhead of the handle approach.
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
This document is the dissertation of Gianmarco De Francisci Morales submitted for the PhD program in Computer Science and Engineering at IMT Institute for Advanced Studies in Lucca, Italy. The dissertation addresses challenges in managing and analyzing large datasets, or "big data", and presents algorithms for tasks like document filtering, graph computation and real-time news recommendation. It was approved by the program coordinator and supervisor, and reviewed by two external reviewers. The dissertation contains six chapters, including introductions to big data and related work, and presents three contributed algorithms for document filtering, graph computation and news recommendation that scale to large datasets through parallel and distributed techniques.
This document summarizes a student project on predicting malicious activity using real-time video surveillance. The project applies techniques like super-resolution, face and object recognition using HOG features, and neural networks to enhance video quality, identify objects and faces, and semantically describe scenes to detect unusual activity. Algorithms were implemented in MATLAB and results were stored in a MongoDB database. Key techniques included super-resolution, PCA-based face recognition, HOG-based object detection, and neural networks like CNNs and RNNs for image captioning. The project aims to help detect criminal activity and track convicted individuals in public spaces.
This document is a master's thesis submitted by R.Q. Vlasveld to Utrecht University in partial fulfillment of the requirements for a Master of Science degree. The thesis explores using one-class support vector machines (SVMs) for temporal segmentation of human activity time series data recorded by inertial sensors in smartphones. The author first reviews related work in temporal segmentation and change detection methods. An algorithm is then presented that uses an incremental SVDD model to detect changes between activities in a continuous data stream. The algorithm is tested on both artificial and real-world human activity data sets recorded by the author. Quantitative and qualitative results demonstrate the method can find changes between activities in an unknown environment.
The document outlines the author's preparation for coding interviews at Google, including a review of important data structures, algorithms, and problem domains. The author plans to thoroughly review arrays, trees, graphs, dynamic programming, recursion, sorting, strings, caching, game theory, computability, bitwise operators, math, concurrency, and system design. They will also practice solving problems involving arrays, strings, trees, graphs, divide-and-conquer, dynamic programming, and more. The single document the author intends to bring summarizes their background, projects, most difficult bugs, and other experiences that may be relevant questions during the interview.
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...Jason Cheung
This dissertation describes the development of a mobile web application called Trinity Impulse that aims to increase student awareness of and engagement with college events. The author conducted research on topics like student engagement, retention, and usability for location-based information. Based on requirements gathered from stakeholders and example usage scenarios, the author designed and implemented Trinity Impulse using technologies like PHP, JavaScript, and a MySQL database. The application aggregates events from the college website and Facebook. It was evaluated through usability testing with students, which provided feedback on the interface and indicated the application could potentially increase event attendance. Overall, the dissertation explores how improving awareness of events may lead to higher student engagement at college.
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructuresNguyễn Nho Vĩnh
This document is a textbook about problem solving with algorithms and data structures. It is divided into multiple chapters that cover topics such as algorithm analysis, basic data structures, recursion, sorting and searching algorithms, trees and tree algorithms, and JSON. The introduction chapter discusses the objectives of the book, which are to review concepts in computer science, programming, and problem solving and to understand abstraction and abstract data types. It also provides an overview of the Python programming language.
This document provides an overview of predictive analytics and data mining techniques. It covers topics such as supervised learning, data validation and cleaning, missing data, overfitting, linear regression, support vector machines, cross-validation, classification with rare classes, logistic regression, decision making based on costs, non-standard labeling scenarios, recommender systems, text mining, matrix factorization, social network analysis, reinforcement learning, and more. The document serves as a reference for various predictive analytics and machine learning concepts and methods.
This document is the introduction to a free high school physics textbook. It explains that the textbook is published under a GNU Free Documentation License, which allows readers broad freedoms such as copying and modifying the text. The textbook was written by volunteers to support education and make factual information freely available. It includes contributors, editors, and a core team who developed the textbook.
Trade-off between recognition an reconstruction: Application of Robotics Visi...stainvai
Autonomous and ecient action of robots requires a robust robot vision system that can
cope with variable light and view conditions. These include partial occlusion, blur, and
mainly a large scale dierence of object size due to variable distance to the objects. This
change in scale leads to reduced resolution for objects seen from a distance. One of the
most important tasks for the robot's visual system is object recognition. This task is also
aected by orientation and background changes. These real-world conditions require a
development of specic object recognition methods.
This work is devoted to robotic object recognition. We develop recognition methods
based on training that includes incorporation of prior knowledge about the problem.
The prior knowledge is incorporated via learning constraints during training (parameter
estimation). A signicant part of the work is devoted to the study of reconstruction
constraints. In general, there is a tradeo between the prior-knowledge constraints and
the constraints emerging from the classication or regression task at hand. In order to
avoid the additional estimation of the optimal tradeo between these two constraints, we
consider this tradeo as a hyper parameter (under Bayesian framework) and integrate
over a certain (discrete) distribution. We also study various constraints resulting from
information theory considerations.
Experimental results on two face data-sets are presented. Signicant improvement in
face recognition is achieved for various image degradations such as, various forms of image
blur, partial occlusion, and noise. Additional improvement in recognition performance is
achieved when preprocessing the degraded images via state of the art image restoration
techniques.
Solutions Manual for Linear Algebra A Modern Introduction 4th Edition by Davi...TanekGoodwinss
This document is a solutions manual for Linear Algebra: A Modern Introduction 4th Edition by David Poole. It contains full solutions to all chapters and explorations in the textbook. The solutions manual was prepared by Roger Lipsett and includes copyright information for Cengage Learning, the publisher of the textbook. It provides instructors and students the ability to check their work on problems from the textbook.
This document provides an introduction to security on mainframe systems. It discusses fundamental security concepts like confidentiality, integrity and availability. It also covers security elements such as identification, authentication, authorization, encryption and auditing. Additionally, it examines the System z architecture and how the hardware and operating system provide security features. The document uses a case study about securing an online bookstore to illustrate how these concepts apply in a business context. It is intended to help readers understand mainframe security.
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmKavita Pillai
The first step in thermoplastic recycling is identifying the plastic waste categorically. This manual task is often inefficiency and costly. This study therefore analyzes the problem and presents a automatic classifier based on a WSN infrastructure. The classifier fuses data from two different sources using Kalman filter and neural network. The algorithm is run on a matlab simulator to test the results
This document describes a project that aims to estimate full-body demographics from images using computer vision and machine learning techniques. The project proposes a novel method to automatically annotate images with categorical labels for a wide range of body features, like height, leg length, and shoulder width. The method explores using common computer vision algorithms to extract features from images and video frames and compare them to a database of subjects with labeled body features. The document outlines the requirements, approaches considered, design and implementation of the project, and evaluates the results in estimating demographics and identifying individuals.
From sound to grammar: theory, representations and a computational modelMarco Piccolino
This thesis contributes to the investigation of the sound-to-grammar mapping by developing a computational model in which complex acoustic patterns can be represented conveniently, and exploited for simulating the prediction of English prefixes by human listeners.
The model is rooted in the principles of rational analysis and Firthian prosodic analysis, and formulated in Bayesian terms. It is based on three core theoretical assumptions: first, that the goals to be achieved and the computations to be performed in speech recognition, as well as the representation and processing mechanisms recruited, crucially depend on the task a listener is facing, and on the environment in which the task occurs. Second, that whatever the task and the environment, the human speech recognition system behaves optimally with respect to them. Third, that internal representations of acoustic patterns are distinct from the linguistic categories associated with them.
The representational level exploits several tools and findings from the fields of machine learning and signal processing, and interprets them in the context of human speech recognition. Because of their suitability for the modelling task at hand, two tools are dealt with in particular: the relevance vector machine (Tipping, 2001), which is capable of simulating the formation of linguistic categories from complex acoustic spaces, and the auditory primal sketch (Todd, 1994), which is capable of extracting the multi-dimensional features of the acoustic signal that are connected to prominence and rhythm, and represent them in an integrated fashion. Model components based on these tools are designed, implemented and evaluated.
The implemented model, which accepts recordings of real speech as input, is compared in a simulation with the qualitative results of an eye-tracking experiment. The comparison provides useful insights about model behaviour, which are discussed.
Throughout the thesis, a clear distinction is drawn between the computational, representational and implementation devices adopted for model specification.
Generando aulas inteligentes aumentadas para la formación de estudiantes univ...Noelia Margarita Moreno
Generando aulas inteligentes aumentadas para la formación de estudiantes universitarios en el grado de educación primaria.
III Congreso Internacional sobre Innovación Pedagógica y Praxis Educativa
This document describes a neural network based content-based image retrieval system developed as a final year project. The system uses Haar wavelet transform and RGB and RgYb colour channels as features to train neural networks on the COREL 1k dataset. The trained neural networks are used to retrieve similar images from the dataset and external images based on calculating distance between feature vectors. Experiments were conducted to evaluate the performance of the proposed system and neural network architecture. The system provides a graphical user interface for users to submit queries and obtain returned images.
Masters Thesis: A reuse repository with automated synonym support and cluster...Laust Rud Jacobsen
Having a code reuse repository available can be a great asset for a programmer. But locating components can be difficult if only static documentation is available, due to vocabulary mismatch. Identifying informal synonyms used in documentation can help alleviate this mismatch. The cost of creating a reuse support system is usually fairly high, as much manual effort goes into its construction.
This project has resulted in a fully functional reuse support sys- tem with clustering of search results. By automating the construc- tion of a reuse support system from an existing code reuse repository, and giving the end user a familiar interface, the reuse support system constructed in this project makes the desired functionality available. The constructed system has an easy to use interface, due to a fa- miliar browser-based front-end. An automated method called LSI is used to handle synonyms, and to some degree polysemous words in indexed components.
In the course of this project, the reuse support system has been tested using components from two sources, the retrieval performance measured, and found acceptable. Clustering usability is evaluated and clusters are found to be generally helpful, even though some fine-tuning still has to be done.
This thesis examines the potential for immersive virtual reality (VR) to influence consumer behavior in destination marketing. Through two studies - a lab experiment and a field experiment - the authors investigate the effect of immersive VR versus 2D pictures on consumer outcomes like destination attitude and behavioral intentions. The results show that immersive VR does not have a significant total effect on consumer outcomes. However, mediation analyses reveal indirect effects through factors like telepresence, enjoyment, mental imagery, and predicted emotions/experiences. Perceived picture quality is also found to moderate some relationships. The findings suggest immersive VR content can impact destination attitude and purchase intentions when developed to stimulate important factors and used with high-quality equipment.
This document proposes a system to allow a robot to automatically find a path to a predefined goal in uncontrolled environments. The system has three main modules: 1) An artificial vision module that obtains a quantified representation of the robot's vision using local feature detection and visual words. 2) A reinforcement learning module that receives the vision input and sensor data to compute the state and reward. The state is a normalized vector and sensor data, and reward is based on distance to the goal. 3) A behavior control module. The robot is tested using Sony Aibo to seek the goal and change behavior based on experience, but does not find the optimal route.
This document is the thesis of Arnaud Jean-Baptiste presented at the Universite des Sciences et Technologies de Lille for the degree of Doctor of Philosophy in computer science. The thesis proposes a model of handles to control references in dynamically typed languages by enforcing behavioral properties like read-only at the reference level. It presents three experiments with handles - enforcing read-only, supporting various behavioral properties, and adding state to handles. The thesis also discusses implementation details and evaluates the performance overhead of the handle approach.
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
This document is the dissertation of Gianmarco De Francisci Morales submitted for the PhD program in Computer Science and Engineering at IMT Institute for Advanced Studies in Lucca, Italy. The dissertation addresses challenges in managing and analyzing large datasets, or "big data", and presents algorithms for tasks like document filtering, graph computation and real-time news recommendation. It was approved by the program coordinator and supervisor, and reviewed by two external reviewers. The dissertation contains six chapters, including introductions to big data and related work, and presents three contributed algorithms for document filtering, graph computation and news recommendation that scale to large datasets through parallel and distributed techniques.
This document summarizes a student project on predicting malicious activity using real-time video surveillance. The project applies techniques like super-resolution, face and object recognition using HOG features, and neural networks to enhance video quality, identify objects and faces, and semantically describe scenes to detect unusual activity. Algorithms were implemented in MATLAB and results were stored in a MongoDB database. Key techniques included super-resolution, PCA-based face recognition, HOG-based object detection, and neural networks like CNNs and RNNs for image captioning. The project aims to help detect criminal activity and track convicted individuals in public spaces.
This document is a master's thesis submitted by R.Q. Vlasveld to Utrecht University in partial fulfillment of the requirements for a Master of Science degree. The thesis explores using one-class support vector machines (SVMs) for temporal segmentation of human activity time series data recorded by inertial sensors in smartphones. The author first reviews related work in temporal segmentation and change detection methods. An algorithm is then presented that uses an incremental SVDD model to detect changes between activities in a continuous data stream. The algorithm is tested on both artificial and real-world human activity data sets recorded by the author. Quantitative and qualitative results demonstrate the method can find changes between activities in an unknown environment.
The document outlines the author's preparation for coding interviews at Google, including a review of important data structures, algorithms, and problem domains. The author plans to thoroughly review arrays, trees, graphs, dynamic programming, recursion, sorting, strings, caching, game theory, computability, bitwise operators, math, concurrency, and system design. They will also practice solving problems involving arrays, strings, trees, graphs, divide-and-conquer, dynamic programming, and more. The single document the author intends to bring summarizes their background, projects, most difficult bugs, and other experiences that may be relevant questions during the interview.
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...Jason Cheung
This dissertation describes the development of a mobile web application called Trinity Impulse that aims to increase student awareness of and engagement with college events. The author conducted research on topics like student engagement, retention, and usability for location-based information. Based on requirements gathered from stakeholders and example usage scenarios, the author designed and implemented Trinity Impulse using technologies like PHP, JavaScript, and a MySQL database. The application aggregates events from the college website and Facebook. It was evaluated through usability testing with students, which provided feedback on the interface and indicated the application could potentially increase event attendance. Overall, the dissertation explores how improving awareness of events may lead to higher student engagement at college.
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructuresNguyễn Nho Vĩnh
This document is a textbook about problem solving with algorithms and data structures. It is divided into multiple chapters that cover topics such as algorithm analysis, basic data structures, recursion, sorting and searching algorithms, trees and tree algorithms, and JSON. The introduction chapter discusses the objectives of the book, which are to review concepts in computer science, programming, and problem solving and to understand abstraction and abstract data types. It also provides an overview of the Python programming language.
This document provides an overview of predictive analytics and data mining techniques. It covers topics such as supervised learning, data validation and cleaning, missing data, overfitting, linear regression, support vector machines, cross-validation, classification with rare classes, logistic regression, decision making based on costs, non-standard labeling scenarios, recommender systems, text mining, matrix factorization, social network analysis, reinforcement learning, and more. The document serves as a reference for various predictive analytics and machine learning concepts and methods.
This document is the introduction to a free high school physics textbook. It explains that the textbook is published under a GNU Free Documentation License, which allows readers broad freedoms such as copying and modifying the text. The textbook was written by volunteers to support education and make factual information freely available. It includes contributors, editors, and a core team who developed the textbook.
Trade-off between recognition an reconstruction: Application of Robotics Visi...stainvai
Autonomous and ecient action of robots requires a robust robot vision system that can
cope with variable light and view conditions. These include partial occlusion, blur, and
mainly a large scale dierence of object size due to variable distance to the objects. This
change in scale leads to reduced resolution for objects seen from a distance. One of the
most important tasks for the robot's visual system is object recognition. This task is also
aected by orientation and background changes. These real-world conditions require a
development of specic object recognition methods.
This work is devoted to robotic object recognition. We develop recognition methods
based on training that includes incorporation of prior knowledge about the problem.
The prior knowledge is incorporated via learning constraints during training (parameter
estimation). A signicant part of the work is devoted to the study of reconstruction
constraints. In general, there is a tradeo between the prior-knowledge constraints and
the constraints emerging from the classication or regression task at hand. In order to
avoid the additional estimation of the optimal tradeo between these two constraints, we
consider this tradeo as a hyper parameter (under Bayesian framework) and integrate
over a certain (discrete) distribution. We also study various constraints resulting from
information theory considerations.
Experimental results on two face data-sets are presented. Signicant improvement in
face recognition is achieved for various image degradations such as, various forms of image
blur, partial occlusion, and noise. Additional improvement in recognition performance is
achieved when preprocessing the degraded images via state of the art image restoration
techniques.
Solutions Manual for Linear Algebra A Modern Introduction 4th Edition by Davi...TanekGoodwinss
This document is a solutions manual for Linear Algebra: A Modern Introduction 4th Edition by David Poole. It contains full solutions to all chapters and explorations in the textbook. The solutions manual was prepared by Roger Lipsett and includes copyright information for Cengage Learning, the publisher of the textbook. It provides instructors and students the ability to check their work on problems from the textbook.
This document provides an introduction to security on mainframe systems. It discusses fundamental security concepts like confidentiality, integrity and availability. It also covers security elements such as identification, authentication, authorization, encryption and auditing. Additionally, it examines the System z architecture and how the hardware and operating system provide security features. The document uses a case study about securing an online bookstore to illustrate how these concepts apply in a business context. It is intended to help readers understand mainframe security.
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmKavita Pillai
The first step in thermoplastic recycling is identifying the plastic waste categorically. This manual task is often inefficiency and costly. This study therefore analyzes the problem and presents a automatic classifier based on a WSN infrastructure. The classifier fuses data from two different sources using Kalman filter and neural network. The algorithm is run on a matlab simulator to test the results
This document describes a project that aims to estimate full-body demographics from images using computer vision and machine learning techniques. The project proposes a novel method to automatically annotate images with categorical labels for a wide range of body features, like height, leg length, and shoulder width. The method explores using common computer vision algorithms to extract features from images and video frames and compare them to a database of subjects with labeled body features. The document outlines the requirements, approaches considered, design and implementation of the project, and evaluates the results in estimating demographics and identifying individuals.
From sound to grammar: theory, representations and a computational modelMarco Piccolino
This thesis contributes to the investigation of the sound-to-grammar mapping by developing a computational model in which complex acoustic patterns can be represented conveniently, and exploited for simulating the prediction of English prefixes by human listeners.
The model is rooted in the principles of rational analysis and Firthian prosodic analysis, and formulated in Bayesian terms. It is based on three core theoretical assumptions: first, that the goals to be achieved and the computations to be performed in speech recognition, as well as the representation and processing mechanisms recruited, crucially depend on the task a listener is facing, and on the environment in which the task occurs. Second, that whatever the task and the environment, the human speech recognition system behaves optimally with respect to them. Third, that internal representations of acoustic patterns are distinct from the linguistic categories associated with them.
The representational level exploits several tools and findings from the fields of machine learning and signal processing, and interprets them in the context of human speech recognition. Because of their suitability for the modelling task at hand, two tools are dealt with in particular: the relevance vector machine (Tipping, 2001), which is capable of simulating the formation of linguistic categories from complex acoustic spaces, and the auditory primal sketch (Todd, 1994), which is capable of extracting the multi-dimensional features of the acoustic signal that are connected to prominence and rhythm, and represent them in an integrated fashion. Model components based on these tools are designed, implemented and evaluated.
The implemented model, which accepts recordings of real speech as input, is compared in a simulation with the qualitative results of an eye-tracking experiment. The comparison provides useful insights about model behaviour, which are discussed.
Throughout the thesis, a clear distinction is drawn between the computational, representational and implementation devices adopted for model specification.
Generando aulas inteligentes aumentadas para la formación de estudiantes univ...Noelia Margarita Moreno
Generando aulas inteligentes aumentadas para la formación de estudiantes universitarios en el grado de educación primaria.
III Congreso Internacional sobre Innovación Pedagógica y Praxis Educativa
El documento resume el sistema de salud en los Estados Unidos, notando que es un modelo basado en el capitalismo donde la medicina es considerada un negocio controlado por grandes empresas aseguradoras en lugar del gobierno. Esto restringe el acceso a la atención médica para muchos ciudadanos y los fuerza a recurrir a métodos rudimentarios de curación.
The document discusses AZMET's containerized/modular gold desorption and recovery plant. AZMET has developed containerized unit processes to optimize gold elution circuits and save on project time and costs. Their modular plant design allows clients to select needed process modules for expansion or new projects. The modular plants offer benefits like lower costs, ease of transportation and assembly, and optimum recovery.
This document appears to be a scanned receipt from a restaurant called "The Breakfast Club" dated January 15th, 2023. It lists several food items purchased including pancakes, eggs, bacon, and coffee. The total amount due is $23.45.
Here you can get the information about all nearby important locations of each project like schools and colleges, markets, hotels, hospitals,etc. which makes your life more easier with Vishhram Developers.
Meet Crayon Data : Asia's Hottest Big Data StartupCrayon Data
Crayon Data is an Asian big data startup that aims to simplify decision making for consumers and enterprises using recommendation algorithms. Their Choice Engine platform, SimplerChoices, analyzes taste and interest graphs to provide personalized recommendations that have driven significant value for clients. Some early clients include a hotel chain that increased repeat customers 3x and a payments company that improved lead qualification rates by 50-100% using the platform. Crayon Data was selected as a finalist for the CODE_n award at CeBIT 2014 for their work in driving the data revolution.
Parámetros de operación de máquinas y la seguridad industrial en VenezuelaAlfonso Castellanos
Los parámetros de operación de una máquina permiten garantizar la seguridad de los operadores y de los alrededores, y permiten un aprovechamiento óptimo de las capacidades de la máquina. La seguridad industrial en Venezuela está regulada por organismos como el Ministerio del Trabajo y el IVSS, y busca evitar accidentes laborales y enfermedades profesionales a través de normas y legislación. Los servicios médicos industriales dependientes del Ministerio del Trabajo realizan exámenes para determinar incapacidades laborales y emitir cert
This document discusses the ELENA Technical Assistance program managed by the European Investment Bank. ELENA provides up to 90% funding for technical assistance to support public and private entities in preparing investment programs for sustainable urban transport systems. It can fund additional staffing, technical studies, and assistance with calls for tender to overcome barriers to implementing investment plans. The example provided is a city applying for ELENA support to analyze operational risks and prepare tender documents to replace public buses with more energy efficient hybrid buses. Contact is made through the ELENA website or by email to request preliminary assessments and apply for assistance.
Reflexión Final Seguridad Social (Lesly Carrasquel)Seguridad Social
Este documento resume la reflexión final de una estudiante sobre un curso de Seguridad Social. La estudiante describe su trabajo actual en una empresa donde se aplican descuentos de seguridad social. Aunque la empresa no emplea a personas con discapacidad. La estudiante opina que el curso fue útil para aprender sobre los derechos laborales y que la seguridad social debería ser sostenible a través de impuestos. No tiene sugerencias para mejorar el curso.
The document discusses content-based image retrieval (CBIR). It notes the increasing amounts of digital images being produced and stored without metadata. CBIR aims to analyze image content to discover semantic knowledge and improve image retrieval when no metadata is available. Recent deep learning methods have greatly outperformed traditional CBIR techniques. The document provides an overview of CBIR components, traditional and deep learning-based feature extraction methods, and evaluation of CBIR systems.
This document is a doctoral thesis that examines bringing more intelligence to the web and beyond through semantic web technologies. It discusses the motivation for more intelligent web applications, provides an overview of semantic web technologies and languages. It then presents the H-DOSE semantic platform and its logical architecture for semantic resource retrieval. Several case studies that implemented the H-DOSE platform are also described. The thesis concludes with discussions on related works and potential future directions.
This document is a master's thesis submitted by Sascha Nawrot to Berlin University of Applied Sciences in partial fulfillment of the requirements for a Master of Science degree in Applied Computer Science. The thesis introduces novel, lightweight open source annotation tools for whole slide images that enable deep learning experts and pathology experts to cooperate in creating training samples by annotating regions of interest in whole slide images, regardless of platform or format, in a fast and easy manner. The tools consist of a conversion service to convert whole slide images to an open format, an annotation service for annotating regions of interest, and a tessellation service to extract the annotated regions from the images.
This thesis presents an approach for non-rigid multi-modal object tracking using Gaussian mixture models (GMM). The target is represented by a GMM with each ellipsoid corresponding to a different fragment of the target. A region growing algorithm is used to automatically adapt the fragment set and extract accurate boundaries. Tracking performance is improved by incorporating joint Lucas-Kanade feature tracking to handle large motions. Experimental results demonstrate the effectiveness of the approach on challenging sequences.
This document describes a project to develop an expert search system that mines academic expertise from funded research in Scottish universities. The system aims to integrate data on funded projects from external sources with an existing academic search engine to improve its search results. It will extract expertise information from publications and funded projects to generate expert profiles. Learning to rank algorithms will then be used to rank experts based on their profiles for specific queries. The goal is to enhance the current search engine that identifies experts based on publications by incorporating additional evidence of expertise from funded research projects.
This document is a thesis that examines automated detection of short-lived websites. It presents the design and evaluation of discovery, identification, and classification engines to analyze websites and determine if they are short-lived or replicated across multiple domains. The tools crawl websites to gather content and metadata, calculate similarity metrics, and visualize relationships. Evaluation of the tools found they could successfully identify similar websites and classify pages as likely, unlikely, or partially replicated. The thesis also discusses non-functional requirements like architecture, anonymization techniques, and improving performance. Overall, the document outlines an approach for automatically detecting short-lived or replicated pharmaceutical websites.
This document is the thesis submitted by Kieran Flesk for the degree of Masters of Science in Software Design and Development. It proposes a novel reinforcement learning approach for selecting virtual machines for migration in cloud computing environments. This approach aims to optimize resource usage and reduce energy consumption by dynamically consolidating virtual machines using live migration and switching idle nodes to sleep mode. The reinforcement learning algorithm provides decision support to efficiently deploy applications across different cloud providers while lowering energy usage without negatively impacting service level agreements.
This document is the Software Guide for version 3.20 of the ORFEO Toolbox (OTB). OTB is a set of algorithms encapsulated in a software library developed by CNES to efficiently exploit results from methodological remote sensing research and development studies. It is implemented in C++ and based on the Insight Toolkit (ITK). The guide provides an introduction to OTB, instructions for downloading and installing it, and overviews of the system organization and essential concepts like the data processing pipeline and spatial objects.
This document is a front cover and table of contents for a book published by IBM about using IBM's Operational Decision Manager Advanced and predictive analytics to create systems of insight for digital transformation. It introduces key concepts around decision making, decision automation, and systems of insight. It also provides an overview of the types of solutions discussed in the book, including real-time, retroactive, and proactive solutions using event-driven processing, predictive analytics, and other techniques.
An Optical Character Recognition Engine For Graphical Processing UnitsKelly Lipiec
This dissertation investigates building an optical character recognition (OCR) engine for graphical processing units (GPUs). It describes Jeremy Reed's doctoral dissertation from the University of Kentucky in 2016. The dissertation introduces basic OCR and GPU concepts. It then describes in detail the SegRec algorithm developed by the author for segmenting and recognizing characters on a GPU. Evaluation results comparing SegRec to other OCR engines are provided. The dissertation concludes by discussing limitations and opportunities for improving SegRec and developing it into a full-featured OCR system.
This document is a research paper written by Craig Ferguson at the University of Cape Town that presents a high performance traffic sign detection technique for use in low power systems or high speed vehicles. The paper introduces the problem of traffic sign detection in vehicles and outlines the objectives and structure of the research. It then reviews existing literature on topics like preprocessing, detection, classification, training and testing. The paper goes on to describe the proposed method, which uses RGB thresholding for segmentation and tracks signs across frames to allow for a voting scheme. It presents results showing the method performs detection at 13ms per frame and achieves 83% detection efficiency, significantly outperforming a cascade classifier detector. The technique is constrained to midday lighting but provides a proof
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...Trevor Parsons
Enterprise applications are becoming increasingly complex. In recent times they have moved away from monolithic architectures to more distributed systems made up of a collection of heterogonous servers. Such servers generally host numerous soft- ware components that interact to service client requests. Component based enterprise frameworks (e.g. JEE or CCM) have been extensively adopted for building such ap- plications. Enterprise technologies provide a range of reusable services that can assist developers building these systems. Consequently developers no longer need to spend time developing the underlying infrastructure of such applications, and can instead concentrate their efforts on functional requirements.
Poor performance design choices, however, are common in enterprise applications and have been well documented in the form of software antipatterns. Design mistakes generally result from the fact that these multi-tier, distributed systems are extremely complex and often developers do not have a complete understanding of the entire ap- plication. As a result developers can be oblivious to the performance implications of their design decisions. Current performance testing tools fail to address this lack of system understanding. Most merely profile the running system and present large vol- umes of data to the tool user. Consequently developers can find it extremely difficult to identify design issues in their applications. Fixing serious design level performance problems late in development is expensive and can not be achieved through ”code op- timizations”. In fact, often performance requirements can only be met by modifying the design of the application which can lead to major project delays and increased costs.
This thesis presents an approach for the automatic detection of performance design and deployment antipatterns in enterprise applications built using component based frameworks. Our main aim is to take the onus away from developers having to sift through large volumes of data, in search of performance bottlenecks in their applica- tions. Instead we automate this process. Our approach works by automatically recon- structing the run-time design of the system using advanced monitoring and analysis techniques. Well known (predefined) performance design and deployment antipat- terns that exist in the reconstructed design are automatically detected. Results of ap- plying our technique to two enterprise applications are presented.
The main contributions of this thesis are (a) an approach for automatic detection of performance design and deployment antipatterns in component based enterprise frameworks, (b) a non-intrusive, portable, end-to-end run-time path tracing approach for JEE and (c) the advanced analysis of run-time paths using frequent sequence mining to automatically identify interesting communication patterns between com- ponents.
This document is a feasibility study report submitted by Benjamin Kremer for the MSc Computer Science degree at University College London. The report examines the feasibility of constructing a system to verify and quantify collaborative work using blockchain architecture. The project aimed to address the problem of student disengagement by developing an API and mobile application to interact with a blockchain that records collaborative task and team data. While the project did not fully establish a way to verify and quantify collaboration, it demonstrated the concept is feasible with more time and blockchain expertise. The report describes the background, requirements, design, implementation, and testing of the prototype system developed as a proof of concept.
This document specifies the Linked Media Layer architecture and describes its key components. The architecture includes a repository layer for media storage and metadata, an integration layer, and a service layer. It also describes modules for unstructured search using Apache Nutch/Solr, media collection from social networks, searching media resources with latent semantic indexing, and participation in the MediaEval 2013 benchmarking initiative for video search and hyperlinking tasks.
This document provides an overview and introduction to dimensional modeling for business intelligence. It discusses how dimensional modeling differs from traditional SQL and E/R modeling by focusing on query performance and ease of analysis rather than data storage and transactions. The document also outlines some key concepts in dimensional modeling like fact tables, dimension tables, and grains. It emphasizes that dimensional modeling helps optimize data access and analysis for business intelligence activities.
This document is a thesis submitted by Livinus Obiora Nweke for the degree of Master of Science in Computer Science. The thesis proposes a framework for validating network artifacts in digital forensics investigations based on stochastic and probabilistic modeling of the internal consistency of artifacts. The framework consists of three phases - data collection, feature selection using Monte Carlo Feature Selection, and a validation process using logistic regression analysis. The framework is demonstrated on network artifacts from intrusion detection systems. The experiment results show the validity of the network artifacts and can support assertions from the artifacts in investigations.
This thesis proposes a novel way to introduce self-configuration and self-optimization autonomic characteristics to algorithmic skeletons using event-driven programming techniques. By leveraging event-driven programming, the approach is not tied to a specific application architecture and allows for structural changes at runtime. It also enables estimates of future work to be calculated on-the-fly rather than relying on pre-calculated estimates. The thesis focuses on guaranteeing a given execution time for a skeleton by optimizing the number of threads. It contributes a novel event-based separation of concerns for skeletons and evaluates strategies for estimating execution times and parallelism levels.
This thesis proposes a novel way to introduce self-configuration and self-optimization autonomic characteristics to algorithmic skeletons using event-driven programming techniques. By leveraging events, the approach is not tied to a specific application architecture and allows for structural changes at runtime. It focuses on guaranteeing execution time for skeletons by optimizing thread allocation. Other contributions include a novel separation of concerns for skeletons using events, and evaluating estimation strategies for predicting future work.
This document is a project report submitted by Ramashish Baranwal and Ripinder Singh for the degree of Bachelor of Technology. It outlines their work on developing a content-based image retrieval system called Imagefinder. The system segments images into homogeneous regions, extracts visual features like color and shape, and indexes the features using a C-tree for efficient retrieval of similar images based on user queries. Experimental results demonstrate the image segmentation and retrieval capabilities of the system. The report also discusses potential improvements like incorporating relevance feedback to further refine search results.
1. Consistency, Clarity & Control:
Development of a new approach to
WWW image retrieval
Trystan Upstill
A subthesis submitted in partial fulfillment of the degree of
Bachelor of Information Technology (Honours) at
The Department of Computer Science
Australian National University
November 2000
3. Except where otherwise indicated, this thesis is my own original work.
Trystan Upstill
24 November 2000
4.
5. Acknowledgements
I would like to thank the ANU for providing financial support for my honours year
through the Paul Thistlewaite memorial scholarship. Paul was an inspiring lecturer
and I am privileged to have received a scholarship in his honour.
Thanks to my supervisors, Raj Nagappan, Nick Craswell and Chris Johnson, for the
continual flow of great ideas and support throughout the year.
Thankyou AltaVista, for not banning my IP address, following my constant and un-
relenting barrage on your image search engine.
Thanks to the honours gang, Vij, Nige, Matt, Derek, Mick, Tom, Mel, Pete & Jason,1
for a fun and eventful time during a long and taxing year. I wish you all the best for
the future and hope to keep in touch.
Thanks to all those from 5263, Bodhi, Nick, Andy, Andy, Ben, Jake, Josh, Josh & Jonno,
for making my life marginally less 5263.
Thanks to my other fellow compatriots, Carla, Jenny, Fiona, Tam & Nils for constantly
reminding me what a geek I am, and reminding me that some members of the human
race are female.
Thanks to my family, Mum, Dad and Detts, who somehow managed to put up with
me all year. Your support during my education has been immeasurable and my
achievements owe a lot to you.
And finally, last but not least, thankyou Beth. Your tremendous support and under-
standing has allowed me to maintain a degree of sanity throughout the year — now
lets go to the beach.
1
Honourary Member
v
6.
7. Abstract
The number of digital images is expanding rapidly and the World-Wide Web (WWW)
has become the predominant medium for their transferral. Consequently, there ex-
ists a requirement for effective WWW image retrieval. While several systems exist,
they lack the facility for expressive queries and provide an uninformative and non-
interactive grid interface.
This thesis surveys image retrieval techniques and identifies three problem areas in
current systems: consistency, clarity and control. A novel WWW image retrieval ap-
proach is presented which addresses these problems. This approach incorporates
client-side image analysis, visualisation of results and an interactive interface. The
implementation of this approach, the VISR or Visualisation of Image Search Results tool
is then discussed and evaluated using new effectiveness measures.
VISR offers several improvements over current systems. Consistency is aided through
consistent image analysis and result visualisation. Clarity is improved through a vi-
sualisation, which makes it clear why images were returned and how they matched
the query. Control is improved by allowing users to specify expressive queries and
enhancing system interaction.
The new effectiveness measures include a measure of visualisation precision and vi-
sualisation entropy. The visualisation precision measure illustrates how VISR clusters
images more effectively than a thumbnail grid. The visualisation entropy measure
demonstrates the stability of VISR over changing data sets. In addition to these mea-
sures, a small user study is performed. It shows that the spring-based visualisation
metaphor, upon which VISR’s display is based, can generally be easily understood.
vii
13. Chapter 1
Introduction
“What information consumes is rather obvious: it consumes the attention of its
recipients. Hence a wealth of information creates a poverty of attention, and a
need to allocate that attention efficiently among the overabundance of information
sources that might consume it.”
– H.A Simon
1.1 Motivation
Recently, there has been a huge increase in the number of images available on-line.
This can be attributed, in part, to the popularity of digital imaging technologies and
the growing importance of the World-Wide Web in today’s society. The WWW pro-
vides a platform for users to share millions of files with a global audience. Further-
more, digital imaging is becoming widespread through burgeoning consumer usage
of digital cameras, scanners and clip-art libraries [16]. As a consequence of these de-
velopments, there has been a surge of interest in new methods for the archiving and
retrieval of digital images.
While retrieving text documents presents its own problems, finding and retrieving
images adds a layer of complexity. The image retrieval process is hindered by dif-
ficulties involved with image description. When outlining image needs, users may
provide subjective, associative1 or incomplete descriptions. For example figure 1.1
may be described objectively as “a cat”, or “a cat with a bird on its head”. It could be
described bibliographically, as “Paul Klee”, the painter. Alternatively, it could be de-
scribed subjectively as “a happy colourful picture” or “a naughty cat”. It could also be
described associatively as “find the bird” or “the new cat-food commercial”. Each of
these queries arguably provide equally valid image descriptions. However, generally
Web page authors, when describing images, provide just a few of the permutations
describing image content.
1
describing an action portrayed by the image, rather than image content
1
14. 2 Introduction
Figure 1.1: Example Image: “cat and bird” by Paul Klee.
Current commercial WWW image search engines provide a limited facility for image
retrieval. These engines are based on existing document retrieval infrastructure, with
minor modifications to the underlying architecture. An example of a current approach
to WWW image retrieval is the AltaVista [3] image search engine. AltaVista incorpo-
rates a text-based image search, allowing users to enter textual criteria for an image.
The retrieved results are then displayed in a thumbnail grid as shown in figure 1.2.
However, there is scope for improvement. Current WWW image retrieval systems
are limited to using textual descriptions of image content to retrieve images, with no
capabilities for retrieving images using visual features. Further, the image search re-
sults are presented in an uninformative and non-interactive thumbnail grid.
Figure 1.2: Altavista example grid. For the query “Trystan Upstill”.
15. Ü1.2 Approach 3
1.2 Approach
This dissertation presents a new approach to resolve weaknesses observed in current
WWW image retrieval systems. This new approach is implemented in the VISR (Vi-
sualisation of Image Search Results) tool.
A survey of current image retrieval systems reveals three key problem areas: consis-
tency, clarity and control. This thesis aims to find solutions to these problems through
a new architecture:
¯ consistency: through client-side image analysis and result visualisation.
¯ clarity: through a visualisation, which makes it clear why images were returned
and how they matched the query.
¯ control: by allowing users to specify expressive queries and enhancing system
interaction.
Using new effectiveness measures, the resulting architecture is compared against tra-
ditional approaches to WWW image retrieval.
1.3 Contribution
This thesis contributes knowledge to several domains: WWW information retrieval,
image retrieval, information visualisation and information foraging.
Contributions are made through:
1. The identification of the problem areas of consistency, clarity and control, from
current literature.
2. The creation of a new approach to WWW image retrieval and an effectiveness
comparison with the existing approach.
3. The implementation of a tool based on the new approach, VISR.
4. The proposal of two new evaluation measures: visualisation precision and visu-
alisation entropy.
5. The analysis of the VISR tool with respect to consistency, clarity and control and
the effectiveness measures.
1.4 Organisation
Chapter 2 introduces the domain of information retrieval. A framework that describes
traditional information retrieval is presented. A glossary of terms is provided.
16. 4 Introduction
Chapter 3 presents a survey of current image retrieval systems. It contains an overview
of WWW image retrieval problems organised into logical phases.
Chapter 4 outlines novel modifications to the information retrieval process model.
This chapter introduces new system modules, their purposes and how they address
limitations outlined in chapter 3.
Chapter 5 describes the VISR tool. Example use cases are explored.
Chapter 6 presents evaluation criteria to measure the effectiveness of the VISR tool.
New evaluation techniques are presented, and an evaluation of system effectiveness
is performed.
Chapter 7 discusses the implications of the experimental results in Chapter 6 with
respect to WWW image retrieval problems.
Chapter 8 contains the conclusion. Contributions are described and future work is
proposed.
Appendix A contains a discussion of surveyed information visualisation systems.
Appendix B provides tables containing the full numerical results from the experi-
ments performed.
Appendix C contains a sample user study, used during the evaluation of the VISR
tool.
17. Chapter 2
Domain
“To look backward for a while is to refresh the eye, to restore it, and to render it
more fit for its prime function of looking forward. ”
– Margaret Fairless Barber
2.1 Overview
This dissertation is based in the domain of information retrieval. The process of com-
puter based information retrieval is complex and has been the focus of much research
over the last 50 years. This chapter contains a summary of this research as it relates to
this thesis, and a conceptual framework for the analysis of the information retrieval
process.
2.2 Glossary of Terms
document: any form of stored encapsulated data.
user: a person wishing to retrieve documents.
expert user: a professional information retriever wishing to retrieve documents (e.g.
a librarian).
visualisation: is the process of representing data graphically.
Information Visualisation: is the visualisation of document information.
cognitive process: is thinking or conscious mental processing in a user. It relates
specifically to our ability to think, learn and comprehend.
information need: the requirement to find information in response to a current prob-
lem [35].
query: an articulation of an information need [35].
Information Retrieval: the process of finding and presenting documents deduced
from a query.
5
18. 6 Domain
relevance: user’s judgement of satisfaction of an information need.
match: system concept of document-query similarity.
professional description: a well described document, with thorough, complete and
correct textual meta-data.
layperson description: a non-professionally described document, potentially sub-
jective, incomplete or incorrect, this can be attributed to a lack of knowledge of
the retrieval process.
Information Foraging: a theory developed to understand the usage of strategies
and technologies for information seeking, gathering, and consumption in a fluid
information environment [51]. See section 2.9.1 for a concrete description.
recall: is the proportion of all relevant documents that are retrieved.
precision: is the proportion of all documents retrieved that are relevant.
clustering: is partitioning data into a number of groups in which each group collects
together elements with similar properties [18].
image: a document containing visual information.
image data: is the actual image.
image meta-data: is text which is associated with an image.
2.3 Information Retrieval
This thesis’ depiction of the traditional information retrieval model is given in figure 2.1.
In the initial stage of the retrieval process, the user has some information need. The user
then formalises this information need, through query creation. The query is submitted
to the system for query processing, where it is parsed by the system to deduce the doc-
ument requirements. Document index analysis and retrieval then begins, with the goal
of retrieving documents of relevance to the query. The documents are subsequently
presented to the user in a result visualisation, aiming to facilitate user identification of
relevant documents. The user then performs a relevance judgment as to whether the
retrieved document collection contains relevant documents. If the user’s information
need is satisfied, the retrieval process is finished. Conversely, if the user is not satis-
fied with the retrieved document collection, they may refine their original information
need, and the entire process is re-executed.
19. Ü2.3 Information Retrieval 7
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
document
links and
ranking
requirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.1: The traditional information retrieval process. The information flow, depicted
by directed lines, describes communication between system and user processes. System pro-
cesses are operations performed by the information retrieval system. User processes are the
user’s cognitive operations during information retrieval.
20. 8 Domain
2.4 Information Need
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easureinform
ation
need
expression
Figure 2.2: Information Need Analysis.
An information need occurs when a user desires information. To characterise poten-
tial information needs, we must appreciate why users are searching for documents,
what use they are making of these documents and how they make decisions on which
documents are relevant [16].
This thesis identifies several example information needs:
Specific need (answer or document): where one result will do.
Spread of documents: a collection of documents related to a specific purpose.
All documents in an area: a collection of all documents that match the criteria.
Clip need: a less specific need, where users desire a document that somehow relates
to a passage of text.
Specific needs
Example: ‘I want a map of Sydney’
In this situation a single comprehensive map of Sydney will do. If the retrieval en-
gine is accurate, the first document will fulfill the information need. Therefore, the
emphasis is on having the correct answer as the first retrieved result — high precision
at position 1.
21. Ü2.5 Query Creation 9
Spread of Documents
Example: ‘I want some Sydney attractions’
In this situation the user desires a collection of Sydney attractions, potentially in clus-
tered groups for quick browsing. The emphasis is on both high recall, to try and
present the user with all Sydney attractions, and clustering, to relate similar images.
All documents in an area
Example: ‘Give me all your documents concerning the Sydney Opera House’
In this situation the user wants the entire collection of documents containing the Syd-
ney Opera House. The emphasis in this case is on high recall, potentially sacrificing
precision.
Clip need
Example: ‘I want a picture for my story about Sydney Opera House being a model anti-racism
employer’
In this situation the user desires something to do with the Sydney Opera House and
race issues as an insert for their story. In this case, users are not necessarily interested
in relevance, but rather fringe documents that may catch a reader’s eye.
2.5 Query Creation
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.3: Query Creation.
22. 10 Domain
Following the formation of an information need, the user must express this need as a
query. A query may contain several query terms, where each term represents criteria
for the target documents. Web search engine users generally do not provide detailed
queries, with average queries containing 2.4 terms [30].
If a user is looking for documents regarding petroleum refining on the Falkland Is-
lands, they may express their information need as:
Falkland Islands petrol
While an expert user may have a better understanding of how the retrieval system
works and thus express their query as:
+“Falkland Islands” petroleum oil refining
The query processing must take these factors into account and cater to both groups of
users.
2.6 Query Processing
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.4: Query Processing.
System query processing is the parsing and encoding of a user’s query into a system-
compatible form. At this stage, common words may be stripped out and the query
expanded, adding term synonyms.
23. Ü2.7 Document Analysis and Retrieval 11
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easureinform
ation
need
expression
Figure 2.5: Document Analysis and Retrieval.
2.7 Document Analysis and Retrieval
Document Analysis and Retrieval is the stage at which the user’s query is compared
against the document collection index. It is typically the most computationally expen-
sive stage in the information retrieval process.
Common words, termed stopwords, may be removed prior to document indexing or
matching. Since stopwords occur in a large percentage of documents they are poor
discriminators, with little ability to differentiate documents in the collection. Fol-
lowing stopword elimination, document terms may be collapsed using stemming or
thesauri. These techniques are used to minimise the size of the document collection
index, and allow for the querying of all conjugates and synonyms of a term.
The terms are then indexed according to their frequencies both in the query and the
entire document collection. The two statistics most commonly stored in the docu-
ment collection index are Term Frequency and Document Frequency. Term Frequency
is a measure of the number of times a term appears in a document, while Document
Frequency measures the number of indexed documents containing a term.
2.7.1 Ranking
The vector space model is the ranking model of concern in this thesis. The vector
space is defined by basis vectors which represent all possible terms. Documents and
queries are then represented by vectors in this space.
24. 12 Domain
For example, if we have three very short documents:
Document 1: ‘Robot dogs’
Document 2: ‘Robot dog ankle-biting’
Document 3: ‘Subdued robot dogs’
Using the basis vectors:
‘Robot dog’ [1, 0, 0]
‘ankle-biting’ [0, 1, 0]
‘Subdued’ [0, 0, 1]
We can create three document vectors weighted by term frequency:
Document 1 = [1, 0, 0]
Document 2 = [1, 1, 0]
Document 3 = [1, 0, 1]
The vector space for these documents is depicted in figure 2.6.
robot dog
ankle-biting
subdued
document 1
docum
ent 2
docum
ent 3
Figure 2.6: Unweighted Vector Space. Since document 1 only contains “robot dog”, its vector
lies on the “robot dog” axes. Document 2 contains both “robot dog” and “ankle-biting”, as
such its vector lies between those axes. Document 3 contains “subdued” and “robot dog”, its
vector lies between those axes.
The alternative TF/DF weighting of the vectors space is:
Document 1 = [1/3, 0 , 0]
Document 2 = [1/3, 1/1, 0]
Document 3 = [1/3, 0 , 1/1]
25. Ü2.7 Document Analysis and Retrieval 13
robot dog
ankle-biting
subdued
document 1
document2
document3
Figure 2.7: TF/DF weighted Vector Space. This differs from figure 2.6 by using document
term frequencies to weight vector attraction. Since document 1 only contains “robot dog”, its
vector lies on the “robot dog” axes. Document 2 contains both “robot dog” and “ankle-biting”;
“ankle-biting” only appears in one document while “robot dog” appears in all three. This
results in the document vector having a higher attraction to the “ankle-biting” axes. Likewise,
document 3 contains “subdued” and “robot dog”, where “subdued” is less common than
“robot dog”, so its vector has a higher attraction to subdued.
26. 14 Domain
The TF/DF weighted vector space for these documents is depicted in figure 2.7.
In the vector space model, document similarity is measured by calculating the degree
of separation between documents. The degree of separation is measured by calculat-
ing the angle difference, usually using the cosine rule. In these calculations a smaller
angle implies a higher degree of relevance. As such, similar documents are co-located
in the space, as shown in figure 2.8. Conceptually this leads to a clustering of inter-
related documents in the vector space [55].
document 3
sourcedocument
document1
document 2
basis vector 1
basisvector2
Figure 2.8: Vector Space Document Similarity Ranking. The vector space model implies that
document 1 is the most similar to the source document, while document 2 is the next most
similar, and document 3 the least. When querying a vector space model, the query becomes
the source document vector and documents with similar vectors are retrieved.
It is also possible not to generate basis vectors directly from all unique document
terms. Documents can be indexed according to a small number of basis vectors. This
is an application of synonym matching, but where partial synonyms are admitted. An
example of this is to index document 2 on the basis vectors ‘Irritating’ and ‘Friendly’,
as is depicted in figure 2.9.
One of the difficulties involved in vector space ranking is that it can be unclear which
terms matched the document and the extent of the matching. In image retrieval this
drawback, combined with the fact that images are associated with potentially arbi-
trary text, can lead to user confusion regarding why images were retrieved, see section 3.2.1.
27. Ü2.8 Result Visualisation 15
Friendly
Irritating
"robot dog"
ankle-biting
document2
Figure 2.9: Vector Space with basis vectors ‘Friendly’ and ‘Irritating’. In the example in
figure 2.9, prior to the ranking we know that “robot dog”s are moderately friendly and ankle-
biting is extremely irritating. Query terms are ranked in the vector space against partial syn-
onyms.
Other Models
Other models, which are not within the scope of this thesis are thoroughly described
in general information retrieval literature [55, 5, 20, 35]. These include Boolean, Ex-
tended Boolean and Probabilistic models.
2.8 Result Visualisation
Result visualisation in information retrieval is often overlooked in favour of improv-
ing document analysis and retrieval techniques. It is, however, an integral part of the
information retrieval process [7]. Information retrieval systems typically use linear list
result visualisations.
2.8.1 Linear Lists and Thumbnail Grids
Linear lists present a sorted list of retrieved documents ranked from most to least
matching. Thumbnail grids are often used for viewing retrieved image collections.
Thumbnail grids are linear lists split horizontally between rows, a process which is
analogous to words wrapping on a page of text . This representation is used to max-
imise screen real-estate. Images positioned horizontally next to each other are adjacent
in the ranking, while vertically adjacent images are separated by N ranks (where N
is the width of the grid). Thus, although the grid is a two dimensional construct,
thumbnail grids only represent a single dimension — the system’s ranking of images.
28. 16 Domain
query
processing
document
analysis and
retrieval
result
visualisation
information
need
Expressedasquery
relevance
judgement
documentcollection
information
datarequirements
system processes
user (cognitive) processes
information flow
query
creation
satisfaction
m
easure
inform
ation
need
expression
Figure 2.10: Result Visualisation.
Later it is shown that having no relationship between sequential images, and no query
transparency causes problems in current image retrieval systems 3.2.1.
To further maximise screen real-estate, zooming image browsers can be used. Combs
and Bederson’s [12] zooming image browser incorporates a thumbnail grid with a
large number of images at a low resolution. Users select interesting areas of the grid
and zoom in to find relevant images. The zooming image browser did not outperform
other image browsers in evaluation. Frequently users selected incorrect images at the
highest level of zoom. Users were not prepared to zoom in to verify selections and
incur a zooming time penalty.
When using a vector space model with a thumbnail grid visualisation, vector evidence
is discarded. Figure 2.11 depicts a hypothetical thumbnail grid retrieved by an image
retrieval engine for the query “clown, circus, tent”. In this grid, black images are pic-
tures of “circus clown”s, dark grey images are pictures of “circus tent”s and light grey
images with borders are pictures of “clown tent”s. Figure 2.12 depicts the vector space
from which the images were taken. There are three clusters, each containing multiple
images, located at angles of equal distance from the query vector. When compressing
this evidence the ranking algorithm selects images in order of their proximity until
the linear list is full. This discards image vector details, and leads to a thumbnail grid
where similar images are not adjacent.
29. Ü2.8 Result Visualisation 17
Figure 2.11: Example image grid. This example image grid is generated for the query “clown;
circus; tent”. Black images contain pictures of “circus clown”s, dark grey images contain
pictures of “circus tent”s and light grey bordered images contain pictures of “clown tent”s.
Similar images are not adjacent in the thumbnail grid.
30. 18 Domain
Relevant
Image Set 2
Relevant
Image Set 1
Relevant
Image Set 3
1
2
3
Desired
Im
ages
angles 1 = 2 = 3
clown
circus
tent
Figure 2.12: Vector space for example images. This vector space corresponds to the image
grid in figure 2.11. The image collection 1 contains the black images, image collection 2 con-
tains the dark grey images and image collection 3 contains the light grey bordered images.
This vector evidence is lost when compressing the ranking into a grid.
31. Ü2.8 Result Visualisation 19
2.8.1.1 Image Representation
Humans process objects and shapes at a much greater speed than text. Exploitation
of this capability can facilitate the identification of relevant images. Further, when
presenting images for inspection there is no substitute for the images themselves. As
such, it is important, when using an information visualisation for image search results,
to summarise images using their thumbnails.
2.8.2 Information Visualisations
Information visualisations are intended to strengthen the relationship between the
user and the system during the information retrieval process. They attempt to over-
come the limitations of linear rankings by providing further attributes to facilitate user
determination of relevant documents.
As cited by Stuart Card in 1996, ‘If Information access is a “killer app” for the 1990s [and
2000s] Information Visualisation will play an important role in its success”.
The traditional information retrieval process model, figure 2.1, is revised for informa-
tion visualisation. The model of information retrieval adapted for information visu-
alisation, is shown in figure 2.13. This model creates a new loop between the result
visualisation, relevance judgement and query creation. This enables users to swiftly
refine their query and receive immediate feedback from the result visualisation. This
new interaction loop can provide improved clarity and system-user interaction during
searching.
Displaying Multi-dimensional data
When representing multi-dimensional data, such as search results, it is desirable to
maximise the data dimensions displayed without confusing the user. Typically, vi-
sualisations are required to handle over three dimensions of data. This requires the
flattening of the data to a two or three dimensional graphical display.
The LyberWorld system [25] suggests that information visualisations created prior to
its inception, in 1994, were ‘limited’ to 2D graphics, as computer graphics systems
could not cope with 3D graphics. Hemmje argued that 3D graphics allow for “the
highest degree of freedom to visually communicate information” and that such vi-
sualisations are “highly demanded”. Indeed, recent research into visualisation has
adopted the development of 3D interfaces. However, problems have arisen from this
practice. This is due, in part, to the requirement that users have the spatial abilities
required to interpret a 3D system. Another drawback, is the user’s inability to view
the entire visualisation at once — the graphics at the front of the visualisation often
obscures the data at the back.
NIST [58] recently conducted a study into the time it takes users to retrieve documents
33. Ü2.8 Result Visualisation 21
from equivalent text, 2D and 3D systems. Results from this experiment illustrate that
there is a significant learning curve for users starting with a 3D interface. During the
experiment the 3D interface proved the slowest method for users accessing the data.
Swan et al. [63] also had problems with their 3D interface, citing that “[they] found
no evidence of usefulness for the[ir] 3-D visualisation”. The argument for and against
the use of 3 dimensions in information visualisations is not within the scope of this
thesis.
Interactive Interfaces
A dynamic visualisation interface can be used to aid in the comprehension of the in-
formation presented in a visualisation. Dynamic Queries and Filters are two ways of
achieving such an interface.
Dynamic Queries [1, 69] allow users to change parameters in a visualisation, with
immediate updates to reflect the changes. This direct-manipulation interface to queries
can be seen as an adoption of the WYSIWYG (What you see is what you get) model,
where a tight coupling between user action and displayed documents exist.
Filters are similar to Dynamic Queries; they allow users to provide extra document
criteria to the information visualisation. Documents that fulfill the criteria are then
highlighted.
2.8.2.1 Example Information Visualisation Systems
While there are many differing information visualisations for information retrieval
results, there are three prominent models: spring-based, Venn-based and terrain map
based. These models are described below.
Spring-based models separate documents using document discriminators [14]. Each
discriminator is attached to documents by springs which attract matching documents
— the degree of attraction is proportional to the degree of match. This clusters the
documents according to common discriminators. In this model the dimensions are
compressed using springs, with each spring representing a dimension. An in-depth
description of spring-based models is given is section 5.3.1. An example is shown
in figure 2.14. Systems that use this model include the VIBE system [49, 15, 36, 23],
WebVIBE [45, 43, 44], LyberWorld [25, 24], Bead [9] and Mitre [33]. A survey of these
visualisations is provided in appendix A.1.
Venn-based models are a class of information visualisations that allow users to in-
terpret or provide Boolean queries and results. In this model, the dimensions are
compressed using Venn diagram set relationships. Systems that use this model in-
clude InfoCrystal [61] and VQuery [31]. A survey of these visualisations is provided
in appendix A.2.
34. 22 Domain
Terrain map models are information visualisations that illustrate the structure of the
document collection by showing different types of geography on a map. These visu-
alisations are based on Kohonen’s feature map algorithm [54]. Dimensions are com-
pressed into map features such as mountain ranges and valleys. An example visual-
isation is shown in figure 2.15. Two systems that use this model are: SOM [38] and
ThemeScapes [42]. A survey of these visualisations is provided in appendix A.3.
Other information visualisation models also exist:
¯ Clustering Models: depict relationships between clusters of documents [58, 13].
¯ Histographic Models: seek to visualise a large number of document attributes at
once [22, 68, 67].
¯ Graphical Plot Models: allow for a comparison of two document attributes [47,
62].
Systems that illustrate these visualisation properties can be found in the appendix A.4.
Figure 2.14: Spring-based Example: The VIBE System. In this example VIBE is being used
to visualise the “president; europe; student; children; economy” query. Documents are rep-
resented by different sized rectangles, with high concentration clusters in the visualisation
represented by large rectangles.
2.9 Relevance Judgements
Only a user can judge the relevance of images in the retrieved document collection.
Document Analysis and Retrieval systems do not understand relevance, only match-
ing documents to a request. Therefore, the final stage of information retrieval is the
cognitive user process of discovering relevant documents in the retrieved document
collection. The cognitive knowledge derived from searching through the retrieved
document collection for relevant documents can lead to a refinement of the visual-
isation, or to a refinement of the original information need. This demonstrates the
35. Ü2.9 Relevance Judgements 23
Figure 2.15: Terrain Map Example: The ThemeScapes system. In this example ThemeScapes
is being used to generate the geography of a document collection. The peaks represent topics
contained in many documents. Conversely, valleys represent topics contained in only a few
documents
iterative nature of information retrieval — the process is repeated until the user is sat-
isfied with the retrieved document collection.
Information foraging theory, developed by Pirolli et al. [50, 51], is a new approach
to examining the synergy between a user and a visualisation during relevance judge-
ment.
2.9.1 Information Foraging
Humans display foraging behaviour when looking for information. Information for-
aging behaviour is used to the study how users invest time to retrieve information.
Information foraging theory suggests that information foraging is analogous to food
foraging. The optimal information forager is the forager that achieves the best ratio of
benefits to cost [51]. Thus, it is important to allow the user to allocate their time to the
most relevant documents [50].
Foraging activity is broken up into two types of interaction: within-patch and between-
patch. Patches are sources of co-related information. Conceptually patches could be
piles of papers on a desk or clustered collections of documents. Between-patch anal-
ysis examines how users navigate from one source of information to another, while
within-patch analysis examines how users maximize the use of relevant information
within a pile.
37. Chapter 3
Survey of Image Retrieval
Techniques
“Those who do not remember the past are condemned to repeat it.”
– George Santayana
3.1 Overview
Image retrieval is a specialisation of the information retrieval process, outlined in
chapter 2. This chapter presents a survey of current approaches to image retrieval.
This analysis enables an identification of core problems in current WWW image re-
trieval systems.
3.2 WWW Image Retrieval
Three of the large commercial WWW search engines; AltaVista, Yahoo and Lycos,
have recently introduced text-based image search engines. The following observa-
tions are based on direct experience with these engines.
¯ AltaVista [3] has developed the AltaVista Photo and Media Finder. This image re-
trieval engine provides a simple text-based interface (section 3.3.1) to an image
collection indexed from the general WWW community and AltaVista’s image
database partners. Their retrieval engine is based on the technology incorpo-
rated into their text document search engine. Modifications to this architecture
have been made to associate sections of Web page text to images, in order to
obtain image descriptions.
¯ Yahoo! [70] has developed the Image Surfer. This image retrieval engine contains
images categorised into a topic hierarchy. To retrieve images, users can navigate
this topic hierarchy, or perform find similar content-based (section 3.3.2) searches.
As with Yahoo!’s text document topic hierarchy, all images in the system are cat-
egorised manually. This reliance on image classification makes extensive WWW
image indexing intractable.
25
38. 26 Survey of Image Retrieval Techniques
¯ Lycos [40] has incorporated image retrieval through a simple extension to their
text document retrieval engine. Following a user query, Lycos checks to see
whether retrieved pages contain image references. If so, the images are retrieved
and displayed to the user.
3.2.1 WWW Image Retrieval Problems
The WWW image retrieval problems have been grouped into three key areas: consis-
tency, clarity and control.
The citations in this section are to papers in the fields of image retrieval, information
visualisation and information foraging. The problems this thesis identifies in WWW
image retrieval are similar to problems in these fields.
¯ Consistency:
– System Heterogeneity
When executing a query over multiple search engines, or repeatedly over
the same search engine, users typically retrieve differing search results.
This is due to continual changes in the image collections and ranking al-
gorithms used. All WWW search engines use differing, confidential algo-
rithms to rank images. Further, these algorithms sometimes vary according
to image collection properties or system load. These continual changes can
lead to confusing inconsistencies in image search results.
– Unstructured and Uncoordinated Data
The image meta-data used by WWW image retrieval engines to perform
text-based image retrieval is unreliable. Most WWW meta-data is not pro-
fessionally described, and as such, may be incomplete, subjective or incor-
rect.
¯ Clarity:
– No Transparency
The linear result visualisations used by WWW image retrieval engines do
not transparently reveal why images are being retrieved [34, 28]. This limits
the user’s ability to refine their query expression. This situation is amplified
if the meta-data upon which the ranking takes place is misleading.
– No Relationships
39. Ü3.2 WWW Image Retrieval 27
– Reliance on Ranking Algorithms
WWW image retrieval systems incorporate confidential algorithms to com-
press multi-dimensional query-document relationship information (section
2.8.1) into a linear list. These algorithms are not well understood by users,
particularly algorithms that incorporate different types of evidence, e.g. a
combination of text and content analysis [2, 34, 28].
¯ Control:
– Inexpressive Query Language
£ Lack of Data Scalability
The large number of images indexed by WWW image retrieval engines
makes content-based image analysis techniques (section 3.3.2) difficult
to apply. Advanced image analysis techniques are computationally ex-
pensive to run. Further, the effectiveness of these algorithms declines
when used over a collection with a large breadth of content [56].
£ Lack of Expression
Existing infrastructure used by WWW search engines to perform im-
age retrieval provides a limited capacity for users to specify their pre-
cise image needs. Current systems allow only for text-based image
queries [2, 28].
– Coarse Grained Interaction:
£ Coarse Grained Interaction
In providing a search service over a high latency network, current
WWW image retrieval systems are limited to providing coarse grained
interaction. In current systems, users must submit a query, retrieve
results and then choose either to restate the query or perform a find
similar search. Searching is an iterative process, requiring continual re-
finement and feedback [28, 16]. These interfaces do not facilitate the
high degrees of user interaction required during the image retrieval
process.
£ Lack of Foraging Interaction
To enable effective information foraging, a result visualisation must al-
low users to locate patches of relevant information and then perform
detailed analysis of the information contained within a patch [51]. In
current WWW image retrieval engines, there is no grouping of like im-
ages, this prohibits any between patch foraging. Further there is no
way for users to view a subset of the retrieved information. Thus in-
formation foraging (see section 2.9.1) is not encouraged through the
visualisation.
40. 28 Survey of Image Retrieval Techniques
3.2.2 Differences between WWW Image Retrieval and Traditional Image
Retrieval
There are several differences between image retrieval on the WWW and traditional
image retrieval systems. As opposed to WWW systems, in traditional systems:
¯ Consistency is a lesser concern
All systems incorporate an internally consistent matching algorithm, and re-
trieve images from a controlled image collection. Since a user interacting with
the system is always dealing with the same image matching tools, consistency
is a lesser concern.
¯ Quality descriptions are assured
As the retrieval system retrieves images from a controlled database, meta-data
quality is assured.
¯ No Communication Latencies
As the retrieval systems are generally co-located with the images and the user,
there is no penalty associated with search iterations.
3.3 Lessons to Learn: Previous Approaches to Image Retrieval
It is convenient for the analysis to group the progress of image retrieval into logical
phases. The phases of image retrieval development are shown in figure 3.1. Although
the progression is not entirely linear, the phases do represent distinct stages in the
evolution of image retrieval.
3.3.1 Phase 1: Early Image Retrieval
The earliest form of image retrieval is Text-Based Image Retrieval. These engines rely
solely on image meta-data to retrieve images, e.g. current WWW image search en-
gines [3, 40]. Traditional document retrieval techniques, such as vector space ranking,
are used to determine matching meta-data, and hence find images. For more informa-
tion on database text-based image retrieval systems refer to [10].
Examples of text-based queries are:
‘Sydney Olympic Games’
‘Sir William Deane opening the Sydney Olympic Games’
‘Torch relay running in front of the ANU’
‘Happy Olympic Punters’
‘Pictures of Trystan Upstill, by the Honours Gang, taken during the Olympic Games’
41. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 29
Phase 1:
Early Image
Retrieval
Phase 2:
Expressive Query
Languages
Phase 3:
Scalabilitythrough
the Combination of
Techniques
Phase 4:
Clarity through
User Understanding
and Interaction
Image Retrieval
Research
Phase 1: Can we
perform Image
Retrieval on the
World-Wide Web?
World-Wide Web
Image Retrieval
Phase 2: ?
Figure 3.1: The development of image retrieval This diagram shows the logical phases in
the information retrieval process. The section is structured according to these phases.
Although text-based image retrieval is the most primitive of all retrieval techniques,
it does posses useful traits. If professionally described image meta-data is available
during retrieval and analysis it can provide a comprehensive abstraction of a scene.
Additionally, since text-based image retrieval uses existing document retrieval tech-
niques, many different ranking and indexing models are already available. Further,
existing infrastructure can be used to perform image indexing and retrieval — an at-
tractive proposition for current WWW search engines.
Improvements
¯ Ability to Retrieve Images: provides a simple mechanism for image access and
retrieval.
Further Problems
¯ Consistency:
– Unstructured and Uncoordinated data: image retrieval effectiveness relies
on the quality of image descriptions [48]. Further, as it can be unclear which
sections of a WWW page are related to an image’s contents, problems arise
when trying to associate meta-data to images on WWW pages.
¯ Control:
42. 30 Survey of Image Retrieval Techniques
– Inexpressive Query Language:
£ Lack of Expression: text-based querying may not allow the user to
specify a precise image need. There is no way to convey visual image
features to the image search engine.
3.3.2 Phase 2: Expressive Query Languages
Content-Based Image Retrieval enables users to specify graphical queries. The theory
behind its inception is that users have a precise mental picture of a desired image,
and as such, they should be able to accurately express this need [52]. Further, it is hy-
pothesised that this removed reliance on image meta-data minimises retrieval using
potentially incorrect, incomplete or subjective data.
Examples of content-based queries are:
Image properties: ‘Red Pictures’, ‘Pictures with this texture’
Image shapes: ‘Arched doorway’, ‘Shaped like an elephant’
Objects in image: ‘Pictures of elephants’, ‘Generic elephants’
Image sections: ‘Red section in top corner’, ‘Elephant shape in centre’
The six most frequently used query types in content-based image retrieval are:
Colour allows users to query an image’s global colour features. An example of
colour-based content querying is shown in figure 3.2. According to Rui et al.
[28], colour histograms are the most commonly used feature representation.
Other methods include Colour Sets which facilitate fast searching with an ap-
proximation to Histograms, and Colour Moments, to overcome the quantization
effects in Colour Histograms. To improve Colour Histograms, Ioka and Niblack
et al. provide methods for evaluating similar but not exact colours and Stricker
and Orengo propose cumulative colour histograms to reduce noise [28].
Texture is a visual pattern that approximates the appearance of a tactile surface. This
allows the user to specify whether an image appears rough and how much seg-
mentation there an image exhibits. An example of texture-based content query-
ing is shown in figure 3.3. According to Rui et al. [28], texture recognition can be
achieved using Haralick et al.’s co-occurrence matrix representations, Tamura et
al.’s computational approximations to visual texture properties or Simon and
Chang’s Wavelet transforms.
Colour Layout is advanced colour measurement, whereby users are given the ability
to show how colours are related to each other in a scene [48]. For example, a
query containing a gradient from orange to yellow could be used to retrieve a
sunset.
43. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 31
Figure 3.2: Example of a colour query match. This diagram demonstrates colour-based
content querying. In this case the user query is the text criteria“fifa; fair; play; logo” and the
colour “yellow”.
Figure 3.3: Example of a texture query match. This diagram demonstrates texture-based
content querying. In this case the user desires more pictures on the same playing field. The
grass texture is used to retrieve images from the same soccer match.
44. 32 Survey of Image Retrieval Techniques
Shape allows users to query image shapes. An example of shape-based content
querying is shown in figure 3.4.
Figure 3.4: Example of a shape query match. This diagram demonstrates shape-based content
querying. In this case the user sketches a drawing containing a mountain.
Region-Based allows users to outline what types of properties they want in each area
of an image, thereby making the image analysis process recursive. An example
of simple region-based content querying is shown in figure 3.5.
Figure 3.5: Example of a region-based query match. This diagram demonstrates region based
content querying. In this case the user submits a query for an image containing trees on either
side of a mountain and a stream.
Object is a model where an object is deduced from a user supplied shape and an-
gle. This enables the retrieval of images that contain the specified shape in any
orientation.
3.3.2.1 Content-Based Image Retrieval Systems
QBIC (Query by Image Content)1 uses colour, shape and texture to match images
to user queries. The user can provide simple or advanced analytic criteria. Simple
criteria are requirements such as colour or texture, while advanced criteria can incor-
porate query-by-example, with “find more images like this”, or “find images like my
sketch”. To avoid difficulties involved in user descriptions of colours and textures
1
demo online at http://wwwqbic.almaden.ibm.com/cgi-bin/stamps-demo
45. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 33
QBIC contains a texture and colour library. This enables users to select colours, colour
distributions or choose desired textures as queries [19, 29].
NETRA allows users to navigate through categories of images. The query is refined
through a user selection of relevant image content properties. [16, 28, 41].
Excalibur is a query-by-example system. Users provide candidate images which are
matched using pattern recognition technology. Excalibur is a commercial application
development tool rather than a complete retrieval application. The Yahoo! web search
engine uses this technology to find similar images (section 3.2) [16, 28, 17].
Blobworld breaks images into blobs (see figure 3.6). By browsing a thumbnail grid
and specifying which blobs of images to keep, the user identifies blobs of interest and
areas of disinterest. This is used to refine the query [8, 66].
Figure 3.6: The Blobworld System. This screenshot from the Blobworld system illustrates the
process of picking relevant image blobs.
EPIC allows users to draw rectangles and label what they would like in each section
of the image, as shown in figure 3.7 [32].
46. 34 Survey of Image Retrieval Techniques
Figure 3.7: The EPIC System. This screenshot illustrates the EPIC system’s query process.
Users describe their image need through labelled rectangles in the query window on the left.
ImageSearch allows users to place icons representing objects in regions of an im-
age. Users can also sketch pictures if they want a higher degree of control [37]. See
figure 3.8.
3.3.2.2 Phase 2 Summary
Improvements
¯ Consistency:
– Discard unstructured and uncoordinated data: since image meta-data
is never used to index or retrieve the images, problems relating to incom-
plete, incorrect or subjective descriptions are avoided. Further enrichment
is obtained through the ability to use content-based image analysis to query
many differing artifacts in an image.
¯ Control:
– Inexpressive Query Language:
£ New Expression through Content-based Image Retrieval: through
the expressive nature of content-based image retrieval, more thorough
image criteria can be gained from the user. This provides the system
with more information with which to judge image relevance.
Further Problems
¯ Clarity:
47. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 35
Figure 3.8: The ImageSearch system. This screenshot illustrates the ImageSearch system’s
query process. The user positions icons symbolising what they would like in that region of an
image.
– Complex Interfaces: there is a comparatively large user cost incurred with
the creation of content-based queries. If users are required to produce a
sketch or an outline of the desired images, the time or skill required can
prove prohibitive.
¯ Control:
– Inexpressive Query Language:
£ Content-based Image Retrieval algorithms do not scale well: content-
based image retrieval is less effective on large-breadth collections. Since
there are many definitions of similarity and discrimination, their power
degrades when using large breadth image collections as shown in fig-
ure 3.9 [2, 28, 16]
3.3.3 Phase 3: Scalability through the Combination of Techniques
Bearing in mind the limitations of content-based image retrieval on large breadth im-
age collections, several systems have combined both text and content-based image
retrieval. It is hypothesized that content-based analysis can be used on larger image
collections when combined with text-based analysis. The rationale for this is that text-
based techniques can be used to specify a general abstraction of image contents, while
content-based criteria can be used to identify relevant images in the domain.
48. 36 Survey of Image Retrieval Techniques
Figure 3.9: Misleading shape and texture . The first image in this example is the query-by-
example image used as a content-based query. The other images in the grid were retrieved
through matching of shape, texture and colour (image from [56]).
49. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 37
3.3.3.1 Text and Content-Based Image Retrieval Systems
The combination of analysis techniques can either occur during initial query creation,
allowing users to initially specify both text and content-based image criteria, or after
retrieving a collection of images, allowing users to refine the image collection.
Text with Content Relevance Feedback: in these systems, the user initially provides
a text query. Using content-based image retrieval, they then tag relevant images
to retrieve more images like them.
Text and Content Searching: in these systems, both text and content retrieval occurs
at the same time. The user may express both text and content criteria in their
initial query.
Text with Content Relevance Feedback
Chabot, 2 developed by Ogle and Stonebraker, uses simplistic content and text anal-
ysis to retrieve images. Text criteria is used to retrieve an initial collection of images,
followed by content criteria to refine the image collection [48].
MARS is a system that learns from user interactions. The user begins by issuing a
text-based query, and then marks images in the retrieved thumbnail grid as either
relevant or irrelevant. The system uses these image judgements to find more relevant
images. The benefit of this approach is that it relieves the user from having to describe
desirable image features. Users only have to pick interesting image features [27].
Text and Content Searching
Virage incorporates plugin primitives that allow the system to be adapted to specific
image searching requirements. The Virage plugin creation engine is open-source,
therefore plugins can be created by end-users to suit their domain. The Virage en-
gine includes several “universal primitives” that perform colour, texture and shape
matching [16, 28].
Lu and Williams have incorporated both basic colour and text analysis into their im-
age retrieval system with encouraging results using a small database. One of their
major problems was in finding methods to combine evidence from colour and text
matching [39].
3.3.3.2 Phase 3 Summary
Improvements
2
This system has recently been renamed Cypress
50. 38 Survey of Image Retrieval Techniques
¯ Consistency:
– Reduce effects of Unstructured and Uncoordinated data: the image meta-
data is only partially used to retrieve the images, with content-based image
retrieval used as a second criteria for the image analysis.
¯ Control:
– Inexpressive Query Language:
£ Improved Expression: users can enter criteria for images through tex-
tual descriptions and visual appearance. Incorporating both text and
content-based image analysis allows for the consideration of all image
data during retrieval.
£ Improving the scalability of Content-based Image Retrieval: when
combining text-based analysis with content-based analysis, difficulties
involved in performing content-based image retrieval on large breadth
image collections are partially alleviated.
Further Problems
¯ Clarity:
– Reliance on Ranking Algorithms: combining rankings from several dif-
ferent types of analysis engines into a thumbnail grid can be difficult [2, 16,
4, 27].
– No Transparency: when using several analysis techniques it can be hard
for users to understand why images were matched. Without this evidence,
it may be difficult for users to ascertain faults in their query.
3.3.4 Phase 4: Clarity through User Understanding and Interaction
In response to the problems associated with the user understanding of retrieved im-
age collections, several systems have attempted to improve the clarity of the image re-
trieval process. These systems have incorporated information visualisations, outlined
in section 2.8.2, to convey image matching. It is in this light that phase 4 attempts to
improve system transparency, relationship maintenance and to reduce the reliance on
ranking algorithms.
3.3.4.1 Image Retrieval Information Visualisation Systems
The two projects examined in this section provide spring-based visualisations, similar
to the VIBE system in section A.1.
MageVIBE: uses a simplistic approach to image retrieval, implementing text-based
only querying of a medical database. Images in this visualisation are represented by
dots. The full image can be displayed by selecting a dot [36].
51. Ü3.3 Lessons to Learn: Previous Approaches to Image Retrieval 39
Figure 3.10: The ImageVIBE system. This screenshot illustrates the ImageVIBE visualisation
for a user query for an aeroplane in flight. Several modification query terms, such as vertical
and horizontal, are used to describe the orientation of the plane.
ImageVIBE: uses text-based and shape-based querying, but otherwise does not differ
from the original VIBE. ImageVIBE allows users to refine their text queries using con-
tent criteria, such as shapes, orientation and colour [11]. An ImageVIBE screenshot
depicting a search for an aircraft image is shown in figure 3.10.
There is yet to be any evaluation of the effectiveness of these systems.
3.3.4.2 Phase 4 Summary
Improvements
¯ Improved Transparency: providing a dimension for each aspect of the ranking,
enables users to deduce how the image matching occurred.
¯ Relationship Maintenance: the query term relationships between images are
maintained — images that are related to the same query terms, by the same
magnitude, are co-located.
¯ User Relevance Judgements: users select relevant images from the retrieved
image collection, rather than relying on a combination of evidence algorithm to
determine the best match.
Further Problems
¯ Complex Interfaces: systems must be simple. It has been shown that the tradi-
tional VIBE interface is too complex for general users [45, 43, 44].
52. 40 Survey of Image Retrieval Techniques
3.3.5 Other Approaches to WWW Image Retrieval
The WWW has recently become the focus of phase 2 research in image retrieval. Two
such research systems are ImageRover and WebSEEK.
ImageRover is a system that spiders and indexes WWW images. A vector space
model of image features is created from the retrieved images [64, 57]. In this system
users browse topic hierarchies and can perform content-based find similar searches.
The system has encountered index size and retrieval speed difficulties.
WebSEEK searches the Web for images and videos by extracting keywords from the
URL and associated image text, and generating a colour histogram. Category trees
are created using all rare keywords indexed in the system. Users can query the sys-
tem using colour requirements, providing keywords or by navigating a category tree
[59, 60].
3.4 Summary
Phase 1: Early Image
Retrieval
goal
search for images
problems
Unstructured +
Uncoordinated data
Lack of Expression
Phase 2: Expressive
Query Languages
problems
CBIR unscalable
Complex interfaces
Phase 1
Problems
Phase 3:Scalability
through technique
combination
Phase 2
Problems
goals
Phase 4: Clarity
through user
understanding
Phase 3
Problems
problems
problems
problems
goals
goals
transparency
combination of
evidence
Current WWW
Image Search Engines
goal
problems
search for images
WWW Retrieval
Issues
Chapter 4: Improving WWW
Image Retrieval
goals
complex
interfaces
(found in section 3.4.1)
Figure 3.11: Development of WWW Image Retrieval Problems. This diagram illustrates the
development of the WWW Image Retrieval problems as covered in this chapter. The problems
from each phase, and extra WWW retrieval issues must be addressed to create an effective
WWW image retrieval system.
53. Ü3.4 Summary 41
This chapter contained the development of the WWW image retrieval problems, as
shown in figure 3.11. The full list of problems requiring consideration during the
creation of a new approach to WWW image retrieval is then:
¯ Consistency:
– System Heterogeneity
– Unstructured and Uncoordinated Data
¯ Clarity:
– No Transparency
– No Relationships
– Reliance on Ranking Algorithms
¯ Control:
– Inexpressive Query Language:
£ Lack of Expression
£ Lack of Data Scalability
– Coarse Grained Interaction:
£ Coarse Grained Interaction
£ Lack of Foraging Interaction
This chapter has provided a list of current WWW image retrieval problems and pre-
viously proposed solutions. These issues were decomposed into three key problems
areas of consistency, clarity and control. Following the identification of these problems
a survey of previous image retrieval systems, sorted in logical phases of development
were presented. Each phase was viewed in the context of WWW image retrieval, and
how the phase dealt with the WWW image retrieval problems.
A new approach to WWW image retrieval is now presented. This approach attempts
to alleviate these problems to improve WWW image retrieval. In the chapter follow-
ing this discussion this thesis presents the VISR tool, an implementation of the new
approach to WWW image retrieval.
55. Chapter 4
Improving the WWW Image
Searching Process
“Although men flatter themselves with their great actions, they are not so often
the result of great design as of chance.”
– Francis, Duc de La Rochefoucauld: Maxim 57
4.1 Overview
Having outlined the conceptual framework for an information retrieval study in chap-
ter 2, and then presented a survey of image retrieval techniques in chapter 3, this thesis
now addresses the problem at hand — the creation of a new approach to WWW image
retrieval.
The traditional model of the information retrieval process, figure 2.1, must be revised
for the retrieval of images from the WWW. The new approach to WWW image re-
trieval is shown in figure 4.1.
Section a of figure 4.1 is the Flexible Image Retrieval and Analysis Module (section 4.2).
This module incorporates retrieval and analysis plugins used during image retrieval.
Section b of figure 4.1 is the Transparent Cluster Visualisation Module (section 4.3). A
visualisation is incorporated to facilitate user comprehension of the retrieved image
collection’s characteristics.
Section c of figure 4.1 is the Dynamic Querying Module (section 4.4). Through this
module the user is able to tweak their query and get immediate feedback from the
visualisation.
43
56. 44 Improving the WWW Image Searching Process
Figure 4.1: Decomposition of Research Model of Information Retrieval. The new informa-
tion flows are depicted by dashed lines. This diagram can be compared with figure 2.1, the
traditional information retrieval process model. Section a of this diagram depicts the Flexible
Image Retrieval and Analysis Module. Section b depicts the Transparent Cluster Visualisation
Module. Section c depicts the Dynamic Query Modification Module.
57. Ü4.1 Overview 45
;;;;
;;;;
;;;;
;;;;;;;
;;;;;;;
;;;;;;;
query
processing
document
analysis and
retrieval
result
visualisation
information
need
formalquery
foraging for
Information
datasetinformation
refinements
requirements
system processes
user (cognitive) processes
information flow
query
creation
visualisation refinement
satisfaction
m
easure
query
expression
query refinement
G
plugin
analysis
enginesG
plugin
retrieval
engines
detailed
document
analysis
;;
;;
;;user cognitive area
server-side
client-side
Figure 4.2: Research Model with Process Locations. The flexible image retrieval and analysis
module resides on the client-side. To retrieve images, this module connects to several WWW
image search servers, via retrieval plugins, and downloads retrieved image collections. The
images are then pooled prior to analysis. This pool of images forms the image domain. The
transparent cluster visualisation and dynamic query modification modules also reside on the
client-side. This improves interaction available with current non-distributed visualisations,
where the whole information retrieval process has to be re-executed before the image collec-
tion is updated with user modifications.
58. 46 Improving the WWW Image Searching Process
4.2 Flexible Image Retrieval and Analysis Module
This module separates the retrieval and analysis responsibilities, thereby allowing for
more flexible and consistent image analysis.
This module resides on the client-side (see figure 4.2). A retrieval plugin is used to
retrieve an initial collection of images from a WWW image search engine. These im-
age are downloaded to the client machine and form the image domain. The image
domain is then analysed by user specified analysis plugins. This pluggable interface
allows for any number of specified retrieval or analysis engines to be used during the
image retrieval and analysis phase. For example, a collection of image meta-data and
image content analysis techniques may be provided.
The design of this module in the VISR tool implementation is provided in section 5.2.
4.3 Transparent Cluster Visualisation Module
This module visualises the relationships between retrieved images and their corre-
sponding search terms. This removes the requirement for the combination of evidence
by providing a transparent visualisation. Furthermore, to allow for easy identification
of images, thumbnails are used to provide image overviews. Users click on the thumb-
nails to view the full image. To alleviate visualisation latencies, this module resides
on the client-side (see figure 4.2).
The design of this module in the VISR tool implementation is provided in section 5.3.
Screenshots of the VISR transparent cluster visualisation are provided in section 5.5.
4.4 Dynamic Query Modification Module
The dynamic query module allows users to modify queries and immediately view the
resulting changes in the visualisation. This provides a facility for the re-weighting of
query terms, the tweaking of analysis parameters, the zooming of the visualisation
and the application of filters to the image collection.
Experiments have shown that users will only continue to forage for data if the search
continues to be profitable [51]. Thus it is important to have low latencies for query
modifications and system interaction. WWW image retrieval system interaction suf-
fers from high latencies. Distributing the system as shown in figure 4.2 provides lower
interaction latencies.
The design of this module in the VISR tool implementation is provided in section 5.4.
59. Ü4.5 Proposed Solutions to Consistency, Clarity and Control 47
4.5 Proposed Solutions to Consistency, Clarity and Control
4.5.1 Consistency
Current WWW search engines use varied ranking techniques on meta-data which is
often incomplete or incorrect. This can confuse users.
System Heterogeneity
The flexible image retrieval and analysis module provides a consistent well-understood
set of tools for image analysis. When results from these tools are incorporated into the
transparent cluster visualisation, images are always displayed in the same manner.
This implies that if two search engines returned the same image, the images would be
co-located in the display.
Unstructured and Uncoordinated data
The flexible image retrieval and analysis module does not accommodate noisy meta-
data. It does, however, deal with it in a consistent fashion. The use of consistent
plugins and the transparent cluster visualisation may allow for swift identification of
noise in the image collection.
4.5.2 Clarity
Current WWW search engines provide thumbnail grid result visualisations. Thumb-
nail grids do not express why images were retrieved or how retrieved images are
related and thereby make it harder to find relevant images [34, 15].
No Transparency
The transparent cluster visualisation facilitates user understanding of why images are
retrieved and which query terms matched which documents. This assists the user in
deciphering the rationale for the retrieved image collection and avoids user frustra-
tion by facilitating the “what to do next” decision. A key issue in image retrieval is how
images are perceived by users [28]. Educating users about the retrieval process assists
them to understand how the system is matching their queries, and thereby how they
should form and refine their queries.
No Relationships
The maintenance of image relationships enables the clustering of related images. This
allows users to find similar images quickly.
Reliance on Ranking Algorithms
The maintenance of per-term ranking information, reduces the reliance on ranking
algorithms. When using the transparent cluster visualisation there is no combination
of evidence except in the search engine, which is only required to derive an initial
quality rating, either matching or not so.
60. 48 Improving the WWW Image Searching Process
4.5.3 Control: Inexpressive Query Language
Current WWW search engines limit the user’s ability to specify their exact image need.
For example, because image analysis is costly, most systems do not allow users to
specify image content criteria. Further, a reduction of effectiveness is observed during
the scaling of these techniques across large breadth collections [56].
Lack of Expression
The client-side distribution of the analysis task in the flexible retrieval and analysis
module reduces WWW search engine analysis costs. Through the use of the image
domain, expensive content-based image retrieval techniques and other analysis is per-
formed over a smaller image collection. Further, the use of these techniques does not
require modifications to the underlying WWW search engine infrastructure.
Lack of Data Scalability
In the proposed flexible analysis module, the user is able to nominate several analysis
techniques that operate concurrently during image matching. Through third-party
analysis plugins, users can perform any type of analysis.
4.5.4 Control: Coarse Grained Interaction
Current WWW search engines provide non-interactive interfaces to the retrieval pro-
cess. This provides users with minimal insight into how the retrieval process occurs
and renders them unable to focus a search on an interesting area of the result visuali-
sation.
Coarse Grained Interaction
New modes of interaction and lower latencies are achieved through the use of client-
side analysis, visualisation and interface. When interacting with the dynamic query
modification module the user’s changes are reflected immediately in the visualisation.
All tasks that do not require new documents to be retrieved are completed with low
latencies. Thus, features such as dynamic filters, query re-weighting and zooming can
be implemented effectively.
Lack of Foraging Interaction
Foraging interaction is encouraged though the transparent cluster visualisation’s abil-
ity to cluster and zoom. Between-patch foraging is aided through the grouping of
similar images. Within-patch foraging is facilitated through the ability to examine a
single cluster in greater detail. Through zooming, users are able to perform a more
thorough investigation of the images contained within a cluster. An example of this
practice is shown in figure 4.3.
61. Ü4.6 Summary 49
r r
between-patch scanning identifies relevant patch within-patch scanning identifies relevant image
Figure 4.3: Foraging Concentration.. The user scans all clusters of images to locate the rel-
evant image cluster. In this case the black, light grey and dark grey squares are all checked
for relevance. This process is termed between-patch foraging. Following the selection of a po-
tentially relevant patch, the user begins within-patch foraging. This is shown in the zoomed
window. Through within-patch foraging the user is able to locate the relevant image.
4.6 Summary
This chapter proposed a new approach to WWW image retrieval. Using the frame-
work outlined in chapter 2, solutions were proposed to the image retrieval problems
identified in chapter 3. These solutions shape the new approach to WWW image
retrieval. The new approach contained three theoretical modules: flexible image re-
trieval and analysis, transparent cluster visualisation and the dynamic query modifi-
cation. The flexible image retrieval and analysis module provided a new mechanism
for comprehensive, extensible image retrieval on the WWW. The transparent cluster
visualisation provided a new approach to visualising retrieved document collections.
The dynamic query modification module provides new mechanisms for user inter-
action during the retrieval process. Following the description of these modules this
section presented theoretical evidence to support the use of these modules to alleviate
the WWW image retrieval problems.
The next chapters cover the implementation of these modules in the VISR tool and
effectiveness evaluation experiments.
63. Chapter 5
VISR
“Always design a thing by considering it in its next larger context — a chair in
a room, a room in a house, a house in an environment, an environment in a city
plan.”
– Eliel Saarinen
5.1 Overview
This chapter introduces the architecture of the VISR tool. The three conceptual mod-
ules, described in chapter 4 are now implemented. This chapter is broken down into
the design of each of these modules: the flexible image retrieval and analysis mod-
ule is section 5.2, the transparent cluster visualisation module is section 5.3 and the
dynamic query modification module is section 5.4. Following the description of the
module designs, a series of use cases demonstrate the functionality of the VISR tool.
The figures in this chapter follow the conventions outlined in the diagrams below.
Figure 5.1 is the legend for the information flow diagrams and figure 5.2 is the legend
for the state transition diagrams.
implemented
module
optional
module
data
store
data
flow
internal
operation
multiple
operations
Figure 5.1: Information Flow Diagram Legend.
51
64. 52 VISR
string internal state
state change
string external state
Figure 5.2: State Transition Diagram Legend.
The information flow of the VISR tool is shown in figure 5.3, while the state transition
diagram, figure 5.4, describes the flow of system execution.
65. Ü5.1 Overview 53
Flexible Image
Retrieval and
Analysis Module
(section 5.2)
Query Processor
Transparent Cluster
Visualisation
Module
(section 5.3)
Dynamic
Query Module
(section 5.4)
query
queryterms
document analysis
data
User
WWW Search
Engines
Web Data
The Internet
request id
+
analysis
data
+
docum
ent data
request id +
query terms
requestid
analysis
data
+
docum
entdata
visualisation
modifications
analysis
modifications
query
modifications
analysisdata+
documentdata+
queryterms
searchrequest
webdatalinks
requestwebdata
webdata
Figure 5.3: VISR Architecture Information Flow Diagram. This figure illustrates the data
flow between modules in the VISR tool. The section numbers marked in the figure repre-
sent sections in this chapter discussing those processes. Note: no link is required from dy-
namic query module to query processor because all input into dynamic query module is in a
machine-readable form.
66. 54 VISR
Query Processing
Image Retrieval
and Analysis
Transparent Cluster
VisualisationCreation
Dynamic Query
Mode
Termination
search
request
query
processing
complete
retrieval
and analysis
complete
visualisation
displayed
analysis
modification
request
visualisation
modification
request
user
satisfied
Figure 5.4: VISR Architecture State Transition Diagram. This figure illustrates the flow of
execution of top-level tasks in the VISR tool. VISR is initialised when a search request is
received. The query is processed and image retrieval and analysis occurs. This is the process
of retrieving and analysing an image collection using query criteria. Following the completion
of retrieval and analysis, the transparent cluster visualisation is created. After the visualisation
is displayed, the system enters dynamic query mode where the user may choose to modify the
visualisation or the retrieval and analysis criteria. When the user is satisfied with the results,
VISR terminates.
67. Ü5.2 Flexible Image Retrieval and Analysis Module 55
5.2 Flexible Image Retrieval and Analysis Module
The information flow diagram for the Flexible Image Retrieval and Analysis Module
is shown in figure 5.5, while the state transition diagram is shown in figure 5.6. The
structure of this section is illustrated by the information flow diagram, while the state
transition diagram illustrates the flow of execution.
5.2.1 Retrieval Plugin Manager
The Retrieval Plugin Manager manages all system retrieval plugins. Upon a search
request, the plugin manager determines which retrieval plugins are able to fulfill the
request, either in whole or in part, and sends the appropriate query terms to the re-
trieval engines. Following the completion of retrieval, the retrieved image collection
is pooled. This pool of images forms the image domain.
5.2.1.1 Retrieval Plugin Stack
The plugins connect to their corresponding retrieval engine, translate queries into a
format acceptable to the engine and submit the query. The links retrieved from the
engines are pooled by the plugin, and sent to the Web document retriever for retrieval.
This uses existing Web search infrastructure to retrieve from a large collection of im-
ages.
Implemented Retrieval Plugins
VISR contains a WWW retrieval plugin for the AltaVista image search engine [3]. Al-
taVista only supports text-based image retrieval, as such, queries must contain at least
one text analysis criteria, this may however, be accompanied by multiple content cri-
teria.
5.2.2 Analysis Plugin Manager
The Analysis Plugin Manager manages all the analysis plugins in the system. The
query terms are analysed by their corresponding analysis plugins.
If there is no plugin for a given query type, the system can be set to default to text, or
to ignore the query term. If one plugin services multiple query terms, they are queued
at the desired analysis plugin.
5.2.2.1 Analysis Plugin Stack
The plugins access the search document repository and retrieve the document collec-
tion stored by Web document retriever. The documents are analysed on a per query-
70. 58 VISR
Source Quality
Image URL 34%
Image Name 50%
Title 62%
Alt text 86%
Anchor text 87%
Heading 54%
Surrounding text 34%
Entire text 33%
Table 5.1: Keyword source qualities from [46]
term basis; with each query term ranked individually and stored in the analysis data
repository.
One of the key problems in performing text-based image analysis on the WWW is
how to associate Web page text to images. The association of HTML meta-data to im-
ages retrieved from Web pages is a complex problem. This task becomes even more
arduous because HTML meta-data can be incomplete or incorrect. When using multi-
ple tags in HTML documents to rank images it is important to take the quality of each
source into account when indexing an image.
Lu and Williams [39] use bibliographic data from HTML documents to derive im-
age text relevance. They use a simple product based on unfounded quality measures
to calculate the relevance of document sections to an image. They provide no experi-
mental evidence to support their rankings.
Mukherjea and Cho [46] use a combination of bibliographic and structural informa-
tion embedded in the HTML document to find image relevant text. They then ex-
perimentally determine the quality of each image source. The ratings they found are
presented in table 5.1.
The text-based analysis plugin in the VISR tool uses all sections of the HTML docu-
ment to associate meta-data. Mukherjea and Cho’s text quality measures are used to
scale document section meta-data relevance.
Content-based Analysis Plugin
VISR contains a colour content-based image analysis plugin. This plugin performs a
simple colour analysis of images, given a user specified colour. This plugin provides
proof-of-concept content-based analysis. Other content-based analysis plugins to per-
form more advanced analysis can be incorporated into the system.
Colour analysis is performed using basic histographic analysis, where image colour
71. Ü5.2 Flexible Image Retrieval and Analysis Module 59
components are separated into a specified number of buckets. The higher the number
of buckets, the more accurate the colour comparison. The ranking algorithm matches
red, green and blue levels between images. The retrieved image with the highest
number of pixels of the specified colour is used to normalise the ranking for all other
images.
5.2.3 Web Document Retriever
Given a URL, the Web document retriever downloads Web pages using a utility called
GNU wget. Prior to downloading, the locally cached Web page and image library is
checked to see whether the pages have been previously retrieved, if not, downloading
begins. After the Web pages are downloaded, they are parsed to find image URLs. If
the image or the Web page no longer exists, the Web document retriever discards
page information. If the image link exists in the page, the Web document retriever
downloads the image for further analysis.
5.2.4 Adjustment Translator
The Adjustment Translator takes incoming adjustment requests and determines whether
the adjustment requires a re-retrieval of documents or the re-analysis of the image col-
lection.
72. 60 VISR
5.3 Transparent Cluster Visualisation Module
The information flow diagram for the Transparent Cluster Visualisation module is
shown in figure 5.7, while the state transition diagram is shown in figure 5.8. The
structure of this section is illustrated by the information flow diagram, while the state
transition diagram illustrates the flow of execution.
5.3.1 Spring-based Image Position Calculator
Given query term matching analysis data, the spring-based image position calculator
positions images in the visualisation. The visualisation is based on a spring model
developed by Olsen and Korfhage [49] for the original VIBE. This was formalised by
Hoffman to produce the Radial Visualization (RadViz) [26]. In RadViz, reference
points are equally spaced around the perimeter of a circle. The data set is then dis-
tributed in the circle according to its attraction to the reference points.
In VISR, the distribution occurs thorough query terms applying forces to the images in
the collection. Springs are attached such that each image is connected to every query
term, and images are independent of each other. The query terms remain static while
the images are pulled towards the query terms according to how relevant the query
terms are to the image. When these forces reach an equilibrium, the images are in their
final positions. The conceptual model of this visualisation can be seen in figure 5.9.
Image Space
75. Ü5.3 Transparent Cluster Visualisation Module 63
ondly, the spring metaphor, where images have no attraction to the centre of the vi-
sualisation, and are pulled freely towards whatever query terms they contain. The
query terms can be represented as vectors leaving the centre of the circle.
Vector Sum Metaphor:
ÔÚ×
Ò
½
ØÓØ Ð´ µÕ (5.1)
Where
ÔÚ× is the vector position of an image
Ò is the number of query terms
is the scalar attraction to query term
Õ is the vector position of query term
ØÓØ Ð´ µ is the total attraction the image has to query terms
Spring Metaphor:
Ô××Ù Ø Ø
Ò
½
´ Ô× Õ µ ¼ (5.2)
Where
Ô× is the vector position of an image.
Ô× Õ is the net force . This force moves Ô× until converges to 0. This gives the
final value of Ô×.
The system is able to be configured to use either the spring or vector sum metaphor.
The vector sum metaphor is less useful than the spring metaphor because there are
less unique positions for image and there tends to be a large cluster of images located
near the centre of the display. Vector sum visualisations are more useful for picking
out interesting query terms or outlying images in the image collection, rather than
clusters of images.
5.3.2 Image Location Conflict Resolver
The image location conflict resolver incorporates techniques that allow the user to
view all images, even if they overlap. This process examines the visualisation context,
checking for overlapping images. Overlapping images are indicated by a blue border
as shown in figure 5.11. This thesis presents two techniques to deal with overlapping
images: Jittering, where images are separated from each other, and Animation, where
overlapping images are animated, with a specified delay, from one overlapping image