Master Thesis
The Design of a Rich Internet Application for
Exploratory Search by Real-Time Generation of
Similarity Maps
...
Abstract
Users who cannot formulate a precise query but know there must be a good answer somewhere,
often rely on explorat...
Acknowledgments
A lot of people helped me in different ways all along the research project and brought different
insights an...
Contents
1 Introduction 4
1.1 Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 F...
Chapter 1
Introduction
Search and data visualization are becoming more and more important as we are entering the
Petabyte ...
1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
1.1 Exploratory Search
During the first phase of research we considered the...
1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
the domain. From this analysis phase we derived several things that needed...
1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
1.2 Faceted Classification
One of the approaches in the exploratory sea...
1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
movies, a special tool has been written to extract additional informat...
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
1.3 Interactivity & Responsiveness
Exploratory search is a pro...
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
Figure 1.3: This figure shows the communication principles for ...
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
provide a high level of scalability and maintainability, and m...
Chapter 2
The Concept
2.1 The Idea
In the chapter 1 we considered the implications of exploratory search problem and its b...
2.1. THE IDEA CHAPTER 2. THE CONCEPT
What if we could zoom on both New York and Tokyo and generate a new world map, having...
2.2. THE PROTOTYPE CHAPTER 2. THE CONCEPT
2.2 The Prototype
The MultiMap concept can be divided on two main parts:
• The s...
Chapter 3
The System
3.1 Architectural Overview
The system was designed to be a client-server application with several tie...
3.1. ARCHITECTURAL OVERVIEW CHAPTER 3. THE SYSTEM
interactivity with the data. The main idea behind such a system is to ha...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2 Mathematical Concepts & Algorithms
3.2.1 Overview
Figure...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
diagram of the system, when the information need to be updat...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.2 Preprocessing & Correlations
Overview
The system handl...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Finally, we define a distance function, which is a general co...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
• ratings can be used to create a complete ratings aspect sp...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Figure 3.5: This figure shows the distances between directors...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Figure 3.6: A subset of the precomputed facet network for Ge...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.3 Ranking
We would like to give users the ability to zoo...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.4 Facets Selection
At this point in the data-flow we have...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.5 Movies Selection
The next step in the data-flow is the ...
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.6 Creation of Aspect Maps
The final step in the data-flow ...
3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
3.3 Server Technology
From the beginning of the research, we wanted the syste...
3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
• Through introspection, the server generates a networking libraries, complia...
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
3.4 The Client Front-End
3.4.1 Overview
Figure 3.8: The prototype of the c...
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
3.4.2 GridMap
The reduction to the two dimensional space was already expla...
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Transitions
It often happens that a person viewing a scene fails to see la...
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Cell Representation
The cell representation allows the flipping feature, il...
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Figure 3.13: Details of the movie, second tab. It presents the facet links...
Chapter 4
Usability Aspects
The main purpose of the work was to build a responsive system for a particular Rich Internet
A...
CHAPTER 4. USABILITY ASPECTS
Average results, Usefulness questionnaire
It is useful 6.3/7
It gives me more control over th...
Chapter 5
Conclusions
This thesis described a form of exploratory search where responsiveness was of the essence.
The appl...
CHAPTER 5. CONCLUSIONS
enhance the selection algorithms and be able to evaluate the new algorithm performance based
on the...
Bibliography
[1] International movie database, http://www.imdb.com, December 2009.
[2] Rfc 2616: Hypertext transfer protoc...
BIBLIOGRAPHY BIBLIOGRAPHY
[15] X. Lin. Map displays for information retrieval. Journal of the Americal Society for
Informa...
Appendix A
Protocol Generation DSL
Since I had to do all the programming for the research project myself, the workload was...
APPENDIX A. PROTOCOL GENERATION DSL
Using the protocol definition it is also possible to define the compression direction (N...
Upcoming SlideShare
Loading in …5
×

Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

835 views

Published on

Users who cannot formulate a precise query but know there must be a good answer somewhere, often rely on exploratory search. This requires an interactive and responsive system, or else the user will soon give up. As data bases are becoming larger, more specialized, and more distributed this calls for a Rich Internet Application, fast enough to keep pace with the users explorations. This thesis studies and implements a system, called MultiMap, which computes similarity maps in real-time. This entailed: (1) precomputing every data structure that does not change after the initial query, (2) optimizing algorithms for zooming and map generation (3) and providing a cognitively appropriate visualization of high dimensional space. Applied to a very large movie database, it resulted in a highly responsive, satisfying, usable system.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
835
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

  1. 1. Master Thesis The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps Roman Atachiants Master of Science Thesis DKE 10-5 Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science of Master of Science in Artificial Intelligence at the Department of Knowledge Engineering of the Maastricht University Exam committee: Dr. Eduard Hoenkamp (supervisor) Dr. Ronald Westra Maastricht University Faculty of Humanities and Sciences Department of Knowledge Engineering Master of Science in Artificial Intelligence June 28, 2010
  2. 2. Abstract Users who cannot formulate a precise query but know there must be a good answer somewhere, often rely on exploratory search. This requires an interactive and responsive system, or else the user will soon give up. As data bases are becoming larger, more specialized, and more distributed this calls for a Rich Internet Application, fast enough to keep pace with the users explorations. This thesis studies and implements a system, called MultiMap, which computes similarity maps in real-time. This entailed: (1) precomputing every data structure that does not change after the initial query, (2) optimizing algorithms for zooming and map generation (3) and providing a cognitively appropriate visualization of high dimensional space. Applied to a very large movie database, it resulted in a highly responsive, satisfying, usable system. 1
  3. 3. Acknowledgments A lot of people helped me in different ways all along the research project and brought different insights and opinions. I want to thank my fellow students, professors, friends and family who helped, tested the prototype and supported/endured me during the research. In particular, I would like to thank Dr. Eduard Hoenkamp for his support and supervision of the project. Our regular meetings, discussions, brainstorming helped me a lot from the very beginning and theoretical part of the research, down to the implementation, engineering and design. But aside of professional relationship, I enjoyed his company the most and our discussions about various domains, including: education, technology, politics, travel,... are really memorable to me. Next, I would like to thank a fellow A.I. student, Tom Marechal. He was an invaluable asset and friend, as he provided me with inspiration and ideas all along the research project. Additional, I would like to thank Dr. Johannes C. Scholtes and Dr. Ronald Westra for their support, evaluation and critical thinking. Not only they, during the classes, largely inspired me for this project but also gave various invaluable insights that contributed to making this thesis better. I would also like to thank also everyone who participated in the testing and evaluation of the system, without their time and feedback the project would not be what it is today. 2
  4. 4. Contents 1 Introduction 4 1.1 Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Faceted Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Interactivity & Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 The Concept 12 2.1 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 The System 15 3.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Mathematical Concepts & Algorithms . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 Preprocessing & Correlations . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.4 Facets Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.5 Movies Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.6 Creation of Aspect Maps . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Server Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 The Client Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 GridMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Usability Aspects 35 5 Conclusions 37 A Protocol Generation DSL 41 3
  5. 5. Chapter 1 Introduction Search and data visualization are becoming more and more important as we are entering the Petabyte Age. Traditional approaches of searching large datasets are query-based ones, which by itself implies knowing what the user (researcher) is looking for. However, this approach of searching the information is difficult when one is not familiar with the domain or lacks the knowledge or contextual awareness in order to formulate precise queries to navigate the information space. For example, how do we find something we would like to know more about, but without having the specific knowledge to formulate a precise question? How would we find a movie we might enjoy if we never saw Robert DeNiro or Charlie Chaplin? Or knowing that we enjoy Quentin Tarantino’s movies, how would we discover other, relatively similar movies? In order to find those movies, we perform a search process called exploratory search. Exploratory search is a specialization of information retrieval which represents the activities carried out by searchers who are: • unfamiliar with the domain of their goal (i.e. need to learn about the topic in order to understand how to achieve their goal) • or unsure about the ways to achieve their goals (either the technology or the process) • or even unsure about their goals in the first place. In this research, we try to address this exploratory search problem [27] by introducing a novel interactive search system. This system is called MultiMap and relies on similarity measurements in order to present the latent information relations to the user in a geographic manner. The system have been developed and tested using the Netflix dataset [7], containing about 125.000 movies. A custom selection were performed on the dataset: • The genres were filtered to 28 IMDB genres. • The directors were filtered to those with at least 5 movies made (in total around 2500 directors). • The actors were filtered to those with at least 10 movies where an actor has participated (in total around 6000 actors). • The movies were filtered to those containing all needed information and made by the preselected directors and actors. The final database contained around 16000 movies. 4
  6. 6. 1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION 1.1 Exploratory Search During the first phase of research we considered the exploratory search problem [11] [19], trying to answer the following questions: 1. How to help the user who is unfamiliar with the domain (i.e.: a user who saw only a few movies and/or doesn’t know many directors, actors)? 2. How to help the user who doesn’t know how to find a particular movie? 3. How to help the user who doesn’t know what kind of movies he likes? Figure 1.1: This figure represents an abstracted backwards reasoning that has been applied, in order to answer to exploratory search questions. On the figure: green represents the interesting directions; red represents an unwanted direction; blue represents intermediate steps. Figure 1.1 shows a result of a backwards reasoning we performed in order to try to reason about those 3 questions. The goal of the research was to find a system that can answer those questions without much guessing, mostly because we want the user to explore and learn about 5
  7. 7. 1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION the domain. From this analysis phase we derived several things that needed to be achieved by the system: • An extracted meaning of the data is required, the system should know about the domain. In our particular case, the cinematographic domain. • A way to preserve relations in order to help the user to relate different items. • A way to drill down to individual movies and examine them is needed in order to allow the user to navigate. • Relevance feedback is needed in order to show the user how interesting a particular item is and how relevant it is for his search. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query. The exploration in exploratory search means that a user have to be able to explore different directions and, in a manner, swim in the data. The exploration factor is something very implicit and therefore difficult to evaluate. In contrast to standard search engines, where the user composes a query and the engine returns the closest documents to that query (document), we do not want to select the closest points always in our system and restrict the user to the search results that are the most relevant ones. By doing so, we allow the user to explore different directions in this multi-dimensional space. 6
  8. 8. 1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION 1.2 Faceted Classification One of the approaches in the exploratory search research domain that has been proven useful and used in many different visualization systems is called faceted classification [26] [12]. This approach is very common and widely used all across the World Wide Web, especially on commercial web sites (Amazon, Ebay). Figure 1.2 illustrates the search box of the website Amazon.com, where the fields Author, Title, ISBN, Publisher, Subject, Condition, etc. are the facet categories. Faceted classification system allows assigning a different classifications to a particular object, often, the object we want to search for, which is in our case: a movie. Using multiple classifications enables to reorder the data in multiple of different ways and define a search criteria. Figure 1.2: The advanced search box on the Amazon.com website, the additional fields are different aspects of a book. A facet comprises “clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics of a class or specific subject” [25]. In this thesis, we use the word “Aspect” to distinguish a facet category, and word “Facet” for a particular facet, for example: Aspect : Actors; Facets : Robert DeNiro, Johnny Depp, Bruce Willis... The Netflix contest dataset contained 17700 different movie titles and served as a basis for the data in this research. Considering the need of extracting different facets for each of those 7
  9. 9. 1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION movies, a special tool has been written to extract additional information from the Internet Movie DataBase (IMDB) [1] website and Netflix Database via their exposed APIs. This tool was able to extract about 95% of the information for those movies. In particular, we were interested in: • Genres of the movies (Fantasy, Science-Fiction, Crime, Drama...) • Year of release • IMDB ratings, which is a precise rating from 1 to 10, rounded to 1st decimal • Directors of the movies (Steven Spielberg, Quentin Tarantino...) • Actors of the movies (Robert DeNiro, Johnny Depp, Bruce Willis...) Additionally, there were also some other data about the movies (writers, movie plots, ...), but not as abundant as the five aspects presented above. Therefore, we decided to base the system on above aspects alone. 8
  10. 10. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION 1.3 Interactivity & Responsiveness Exploratory search is a process performed by a human who is using a tool (computer) to interact with large quantities of information in order to explore and find the relevant pieces of information. This human-computer part means by definition that the actual process is an interactive process, therefore the interactivity is a very important aspect in exploratory search. One way to approach interactivity is to start with the notion of “look and feel”. The term has become more or less synonymous with how the term style is used in other design disciplines. In a concrete sense, the “look” of a GUI is its visual appearance, while the “feel” denotes its interactive aspects [24]. One of the consequences is that the interface should be very responsive and fast. One must also consider the fact that search systems need to handle large amounts of data and need a lot of computing power. One logical conclusion is that in order to build a good exploratory search system, the data manipulation should be handled by powerful machines to be fast. During our research, we opted to a client-server approach to enhance the interactivity without losing the computing power we need to perform all operations in real-time, keeping the system well responsive and interactive. By having all operations in real-time, we run into the problem of massive networking communication. The communication in this case is a two-way dialog between the client and the server. We need the communication to be duplex, where the server and the client have the ability to initiate the dialog, because the current world wide web is becoming real-time (huge services as Twitter and Facebook are good examples). As the information flow is updated in real-time, most of the services are still using the traditional HTTP protocol-based technologies. The Hypertext Transfer Protocol (HTTP) is an Application Layer protocol for distributed, collaborative, hypermedia information systems (RFC specifications can be found: [2]). HTTP is a request-response protocol standard for client-server computing. In HTTP, a web browser, for example, acts as a client, while an application running on a computer hosting the web site acts as a server. The client submits HTTP requests to the responding server by sending messages to it. The server, which stores content (or resources) such as HTML files and images, or generates such content on the fly, sends messages back to the client in response. These returned messages may contain the content requested by the client or may contain other kinds of response indications [3]. The problem with using HTTP for interactive and real-time web is a fundamental one, as world wide web evolved, different architectures and new frameworks (SaaS, SOAP, AJAX ...) were built on the top of HTTP protocol, but fundamentally, the real-time communication is mainly done using the polling technique (see figure 1.3). The polling is a workaround, basically it is a client, asking the server for update on a very short interval, constantly. There are several problems with this approach: 1. The client’s and server’s CPU resources are used all the time for mostly useless update checking. This, on mobile devices, potentially drains the battery life. 2. The networking bandwidth is used constantly, and as the networking throughput of the server is limited, this becomes a bottleneck very quickly. In order to find how to design a system responsive enough for such communication, consider the requirements: 9
  11. 11. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION Figure 1.3: This figure shows the communication principles for real-time updates of the polling architecture and a publisher/subscriber architecture. 1. A client-server approach, since the amount of data is important and the computations can be very expensive. 2. Reliable networking is necessary (as we are not considering a streaming application and need a reliable two-way communication), therefore the choice for the transport layer is TCP [14]. 3. A format for message parsing in order to encode/decode complex messages while having the minimum impact on the performance Since those requirements are quite similar to the requirements for multi-player client/server on-line games, we considered that the best place for finding the technological answer for an interactive search system would be the gaming literature [10] [18] [22]. The games are by definition interactive applications, and on-line games are usually intensively optimized for the latency and throughput. Due to the fact that the interactivity requires a lot of duplex communication, the best option is a socket-server [18], and a custom protocol for low-level message encoding. Following those considerations, an interactive exploratory search system can be designed as a multiuser on-line game engine. The architecture should fulfill six goals: minimize network traffic, provide opportunities for load balancing, provide a secure game playing environment, 10
  12. 12. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION provide a high level of scalability and maintainability, and maximize client side performance for real-time graphics [8]. The architecture for the system is layered and component-based: • The Network Component that contains the Packet Serializer (Messenger), De/Encrypt, De/Compress and Network modules. The Messenger module is in charge of forming and sending messages in a given format. • The User Component that contains both the Authenticator and the User Database modules. • The Search Component that is used and designed specifically for the exploratory search purposes with a custom protocol. For the system designed for this thesis, the search component is described more in detail in the section 3.2. As mentioned earlier, the latency is a crucial point for highly interactive applications. Latency refers to the time it takes for a packet of data to be transported from its source to its destination. In many networking texts, you will also see the term Round Trip Time (RTT) in reference to the latency of a round trip from source to destination and then back to source again. In many cases the RTT is twice the latency, but some network paths exhibit asymmetric latencies, with higher latencies in one direction than the other [6]. There are different ways to deal with latency, but simply put: we need more control over the sent/received packets and minimize their size and being able to prioritize and parallelize different actions [5]. 11
  13. 13. Chapter 2 The Concept 2.1 The Idea In the chapter 1 we considered the implications of exploratory search problem and its basic components as faceted classification and interactivity. This thesis introduces a novel ex- ploratory search interface, called MultiMap which relies on similarity measurements in order to present the information to the user. In earlier 1990s it was demonstrated that spacial map- ping techniques can be generated to visualize contents and semantic relationships of a docu- ment space [15], yet, there are still not many systems that actually use mapping techniques. The idea behind a system comes from a simple map, where the information is presented in a geographic manner: two towns that are close on a map mean the closer transition from one to another. Using a map, it is possible to navigate and explore huge amount of information by zooming/unzooming and exploring the dataset both locally and globally. Figure 2.1: A world map with countries divisions. If we can do it for our planet earth using mapping software (Google Maps or Bing Maps are the examples of such software), why couldn’t we explore different datasets in the same way? 12
  14. 14. 2.1. THE IDEA CHAPTER 2. THE CONCEPT What if we could zoom on both New York and Tokyo and generate a new world map, having Washington, New York, Tokyo, Kyoto and Paris in between (use figure 2.1 in order to help imagining)? It can be rather messy to view them in this way, that’s why we also need to introduce the context: Washington and New York are in United States of America, Tokyo and Kyoto are in Japan and Paris is in France. The countries are a clear separation between the cities and helps us to understand better the cities. Now replace the towns by the Movies, the countries by Genres/Actors/Directors and this gives a basic understanding of how MultiMap works. MultiMap is based on this idea of zooming and on-the-fly generation of new maps. Formally it involves choosing new coordinate system. MultiMap features also the ability to unzoom to see again the whole picture and switch the maps if needed (again, think Google Maps). In order to understand better how MultiMap works, let’s go back into the movie context and think of different aspects, facets and movies: • An aspect “Genres” contains facets “Action”, “Adventure”, etc. • The facets “Action”, “Adventure” can relate to movies like “Indiana Jones” etc. • The movie “Indiana Jones” contains the actor “Harrison Ford” (which is also a facet of aspect “Actors”) One can notice that this is a closed loop, it is possible to look at different genres, then look at a particular movie, then switch to actors and go on and explore the information this way. If we imagine for a second that we can create a map of an aspect, where the points (“countries”) would be the facets, we probably should be able to place also the movies (”towns”) on that map. In order to create such maps, we need several components: • A function to compare two facets of an aspect, a distance measurement. For example, this way we would be able to compare the similarity between the Adventure genre and the Action genre or between Tom Hanks and Harrison Ford. • A way to create a map very quickly as new map should be generated when the user zooms on some movie. • A way to measure relevancy of the movies and facets. Considering our example above, what towns we would choose to present on a new map if we zoomed on New York and Tokyo? Paris, London, Rome? Further in this document, chapter 3 explains how the whole system is done, and in particular, the section 3.2 explains all concepts and algorithms that were developed in order to produce a working prototype of MultiMap. 13
  15. 15. 2.2. THE PROTOTYPE CHAPTER 2. THE CONCEPT 2.2 The Prototype The MultiMap concept can be divided on two main parts: • The system that performs all mathematical computations, handles the data and oper- ations on the data. • The front-end that is presented to the user, after all, there are many different ways to present a map. Figure 2.2 shows the front-end that we designed as our first approach to create a visualization for MultiMap system. Figure 2.2: A screen-shot of the prototype, presenting a grid map on the directors aspect. The front-end visualization for the MultiMap we designed is called GridMap, and is one of the approaches to visualize those maps. This approach relies on very ordered presentation of the maps . In fact, it tries to map a cloud of 2D points to a grid while trying preserve the spacial relations. The interface allows users to switch the aspect maps, zoom on different facets and by flipping a grid cell, viewing a details of a particular movie and follow its links to construct new maps. Section 3.4 explains more in detail the actual interface and its different components. 14
  16. 16. Chapter 3 The System 3.1 Architectural Overview The system was designed to be a client-server application with several tiers, in this section we will describe its design. The main idea is based on the interactivity between the user and the data, and the ease-of-use. First of all, the system should meet several prerequisites: • it should be interactive, so it has a real-time constraint; • it should be able to handle large datasets; • it should be easy to use and available to remote users. Figure 3.1: The layered architecture of MultiMap system. Following those prerequisites, the logical conclusion is to build a real-time Rich Internet Appli- cation (RIA) [9]. Such applications are mainly standard n-tier based applications. MultiMap architecture is a 3-tier real-time architecture, allowing to the front-end client to have full 15
  17. 17. 3.1. ARCHITECTURAL OVERVIEW CHAPTER 3. THE SYSTEM interactivity with the data. The main idea behind such a system is to have a clear separa- tion between the client, the logic and the data itself, as illustrated in Fig.3.2. The actual architecture, as described in Fig. 3.1, consists of : • a front-end client in flash, allowing interactive data visualization; • a custom C# real-time server, written by myself in order to handle large amounts of data interactively; • a logic layer running the Matlab engine for all data-intensive search, correlations and other operations. Figure 3.2: Visual overview of a Three-tiered application. Illustration from Wikipedia. 16
  18. 18. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2 Mathematical Concepts & Algorithms 3.2.1 Overview Figure 3.3: The representation of the data-flow, representing how the data is processed on the fly (in an interactive mode). The main purpose of the research is the interactivity of the system. This imposes a real-time constraint and makes things very difficult to engineer, especially when the computation time can take very much time. Based on this, we needed a system, that can handle this data-flow rapidly, and update quickly respond to user queries. Figure 3.3 shows the simplified sequence 17
  19. 19. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM diagram of the system, when the information need to be updated and presented. Next few section explain the details of this schema, block by block. The system uses a content-based recommendation method. In content-based recommendation methods, the utility u(c, s) of item s for user c is estimated based on the utilities u(c, si) assigned by user c to items si ∈ S that are similar to item s. For example, in a movie recommendation application, in order to recommend movies to user c, the content-based recommender system tries to understand the commonalities among the movies user c has rated highly in the past (specific actors, directors, genres, subject matter, etc.). Then, only the movies that have a high degree of similarity to whatever users preferences are would get recommended [4]. Overall, the flow consists of several main points: • The preprocessing step performs the transformation and precomputes the maximum of information that can be precomputed. It considers all aspects and for each facet in each aspect computes a closest network (explained in the section 3.2.2). • The session initialization step initializes the user session and copies some of the prepro- cessed data in a so-called Ranking Matrix. • The update step performs the update of the Ranking Matrix (see 3.2.3 for more infor- mation). By doing so, a new ranking matrix is created, basically updating the ranks/rel- evancy ratings based on the selection. • The facets selection step chooses several facets, based on the Ranking Matrix. To do so, it combines 2 techniques: takes a subset of most relevant facets from the matrix, then performs a k-means clustering to be able to pick most ”global” facets. This step is explained more in detail in section 3.2.4. • The movies selection step selects the most relevant movies for each facet that have been chosen. This step is explained more in detail in section 3.2.5. • The creation of aspect maps performs the multidimensional scaling [23] and a custom grid-map algorithms, in order to create 2-dimensional grid, where the latent relations between different facets are retained. This approach is explained in section 3.2.6. This step can be potentially replaced by any other representation, including 3-dimensional ones. 18
  20. 20. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.2 Preprocessing & Correlations Overview The system handles a lot of data and reorders it continually on each request of the user. In order to allow the system to perform in the real-time, as much data as can be done should be precomputed. Several things that needs to be done: • For each aspect, the facets should be correlated in order to allow the comparison between 2 points. This is done differently for each aspect, depending on the data. It allows, for example, to correlate an Adventure genre and Science-Fiction genre. • For each aspect, the facet network is computed. This network allows us to propagate a ranking and reorder the facets in real-time. See the section 3.2.2 for more details. • For each facet of each aspect, a list of most relevant movies is constructed and ordered. This is done to allow to pick the movies in real-time. This step is explained in more detail in the section 3.2.2. In the precomputation phase, one of the most important result is to be able to construct so- called ”Aspect Spaces”. Aspect Spaces are N-Dimensional dissimilarity matrices. The Aspect Spaces are computed based on a particular distance metric δ(i, j) := distance between i th and j th features of an aspect. In order to simplify the implementation, we define: • Input matrix I is an initial data we need in order to compute similarities between aspect samples. They are presented in N dimensional space, where N is the number of movies, about 16000. • Per aspect, a function δ which can be different for every aspect and computes the membership of the aspect to a particular movie. Next few sections are explaining the definitions and the steps which are performed in order to create each aspect space. Genres Space In order to create the genres space, the genres are correlated using simply the complete movies distribution. The input matrix I for the genres space is defined as following: Ii,j =    δ(Genre1, Movie1) · · · δ(Genre1, Moviej) ... ... ... δ(Genrei, Movie1) · · · δ(Genrei, Moviej)    The membership function δ : δ(Genrei, Moviej) = 1 if movie contains the genre 0 otherwise 19
  21. 21. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Finally, we define a distance function, which is a general cosine distance: ∆(Genrei, Genrej) = Ii ∗ Ij Ii Ij In order to test how good the correlation is, one can use the aspect space as the input for the multidimensional scaling function. This helps to visualize the correlations and see if the desired meaning is preserved. Figure 3.4 show the 2 dimensional genres space, we will call such maps “Aspect Maps”. One can see that the correlation makes sense, for example: the Adventure genre is close to Fantasy and Science-Fiction. Figure 3.4: This figure shows the distances between genres in 2 dimensional space after performing a multidimensional scaling on the genres space. Ratings Space Ratings space can be used in different ways, and depending on the choice of usage, the correlation can be adapted: • ratings can be used as an additional dimension, shown using a color or a font size while showing a movie; • ratings can be shown in order of euclidean distance; 20
  22. 22. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM • ratings can be used to create a complete ratings aspect space, but this requires more complex correlation function. In the research, we decided to use the second approach, simply calculating the euclidean pairwise distance for each rating. Years, Directors and Actors Spaces There are several ways to correlate the years, directors and actors. In our research, we wanted to explore the possibility to correlate those facets based on their genres distribution. This approach would allow the user, for example, to see what kind of movies were done in a particular year and what are similar years, in terms of genres distribution. To do so, we proceed as follows: Ai,j =    δ1(Y ear1, Movie1) · · · δ1(Y ear1, Moviej) ... ... ... δ1(Y eari, Movie1) · · · δ1(Y eari, Moviej)    The membership function δ1 : δ1(Y eari, Moviej) = 1 if movie released that year 0 otherwise Next, we reuse the input matrix I from the genres space. This is defined as follows: Bi,j =    δ2(Genre1, Movie1) · · · δ2(Genre1, Moviej) ... ... ... δ2(Genrei, Movie1) · · · δ2(Genrei, Moviej)    The membership function δ2 : δ2(Genrei, Moviej) = 1 if movie contains the genre 0 otherwise Next, we need to compute the matrix I, which tells us in how many movies of different genres the actor has participated in. This is computed by a matrix multiplication of A and B transposed: Ii,j =    δ(Y ear1, Genre1) · · · δ(Y ear1, Genrej) ... ... ... δ(Y eari, Genre1) · · · δ(Y eari, Genrej)    = A × BT Finally, by computing the pairwise cosine distance for the matrix I, we are able to correlate the years, based on their genres distribution. The same procedure is applied in order to correlate the directors and actors. Figure 3.5 shows the aspect map created for the directors, as we did with the genres, the results seem to make sense: Quentin Tarantino is quite close to Martin Scorcesse (they do very similar kind of crime movies) and at the same time quite far away from George Lucas, the creator of Star Wars saga. 21
  23. 23. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Figure 3.5: This figure shows the distances between directors in 2 dimensional space after performing a multidimensional scaling on the directors space, similar to figure 3.4 Facet Network In order to perform the zooming and allow the system to be interactive, one needs a way to select and sort the facets rapidly. In MultiMap, this is done by precomputing a facet network (Fig. 3.6), and setting a particular rank value to each node in this kind of network. Generally speaking, we need to compute the matrix R with facets on the rows and two (or more) “pointers” to the closest points. The desired matrix R: Ri,3 =    Facet1 1st closest facet 2nd closest facet ... ... ... Faceti 1st closest facet 2nd closest facet    The closest points computation is done using the previous inter-facet correlations. This step can be very time-consuming, as it has the complexity of O(n2). This would interrupt a smooth interaction with the user, and therefore would be prohibitive. Fortunately this matrix can be precomputed even before the interaction starts. In general, anything that can be precomputed, should be precomputed to make the system responsive. 22
  24. 24. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Figure 3.6: A subset of the precomputed facet network for Genres aspect. In MultiMap, everything that can precomputed will be precomputed, which is conducive to a smooth and responsive interaction. Movie Ordering Last step is movie ordering. This step is very straightforward, as it is the rearranging of the movies-facet relations in the following form: Fi,2 =    Facet1 Movie vector, ordered by relevancy ... ... Faceti Movie vector, ordered by relevancy    For the sake of simplicity, we use an IMDb rating as a relevancy measure. This rating is a number from 0 to 10 with one decimal and based on the huge statistics from the IMDb website visitors. The following example of the movie ordering for genres space illustrates this: Fi,2 =      Adventure The Judy Garland Show The Secret of Monkey Island · · · 9, 8 9, 6 · · · ... ... Faceti Movie vector, ordered by relevancy      23
  25. 25. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.3 Ranking We would like to give users the ability to zoom in on individual facets or movies based on their selection. This can be accomplished, by ranking each point and re-ranking them with every zoom. For this we need a facet network (graph), ideally with a 100% coverage of the facets and tightly interconnected. Such a network is constructed in the preprocessing step (see section 3.2.2) in the form of graph where a node (a facet) is connected to 2 closest neighbors. For example Science-Fiction genre would be connected to Adventure genre and Action genre, as illustrated in figure 3.6. Based on such network, a zooming can be effectively done as a recursive algorithm, with several parameters: • Vector B, is a weight vector for the closest points. For example, a vector where first closest gets full weight, second closest gets half of the weight would be: B = (1, 0.5) • Depth-decay function for each node at depth d λ(d + 1, ρ, b) = ρ + (γ/d) ∗ b Where: – d is the actual depth – ρ is the actual rank of the node – γ is the decay factor (a constant) – b is the weight of the point from weight vector The depth-decay function here presented is a linear function, but depending on the context and needs, can be adapted or changed. The depth-decay function calculates the current ranking ρ, which updates the network. The ranking is computed recursively for each neighbor, then the network is sorted by the rank and first x nodes are shown to the user. Additionally, zooming out can be done in several different ways: the simplest (and most computationally efficient one), is to keep track of all changes to the ranking value ρ on each step. This approach would use some memory, but there’s no need to recalculate everything. Another approach would be to recursively recalculate ρ values backwards, but effectively using CPU to do the calculation. The depth-decay function should also be updated in order to support such feature. 24
  26. 26. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.4 Facets Selection At this point in the data-flow we have a Ranking Matrix and a simple solution would consist of performing a selection and simply selecting few first ranked facets. Such an approach is just fine for standard search engines, for example Google, Lemur... In MultiMap, this is performed using a selection algorithm but why do we actually need one? In order to answer this question, let’s consider following: • standard search engines use a query in order search the data, therefore the most relevant documents are the ones what are the closest to the query in this multi-dimensional document space; • in exploratory search we need an exploration factor, allowing the users to explore dif- ferent possibilities. With this, we don’t particularly want to restrict the results to only closely-related and most relevant points, but also to other points, related to the topic (at some extent). The selection algorithm allows us to pick a number of rows from an Input Matrix I. Recall that Input Matrix I is a step just before pair-wise distance comparison, so basically it’s a ready-to- compare matrix, where getting a distance between 2 points actually means something. The idea behind the algorithm is quite simple: it selects a subset of relevant facets, which is bigger than the amount of facets that need to be shown to the user; it tries to find k clusters within the subset and then takes the closest points to each cluster centroid. The selection algorithm works in a rather straightforward way: • first, a selection of top ranked facets is performed. In the prototype we take twice the number of facets that we actually want to present to the user (i.e.: if we need to show a grid of 2 by 2 points, we take the 8 most relevant facets from the ranking matrix); • next, the algorithm computes k-means clustering, with k clusters. Where k clusters would be the number of points to show to the user, for example 4 actors would mean k=4 • once k clusters are found, each point has an assigned index of a cluster and we also have k centroids for each cluster. The selection continues by taking 1 closest point to each centroid, therefore taking the most average point in the particular cluster. • finally, it returns the selected facets. 25
  27. 27. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.5 Movies Selection The next step in the data-flow is the actual selection of the movies. By now the system is going to present the facets it selected (the most relevant facets to the current zoom sequence). The movies presented on the map can be selected simply by taking several first movies, based on some rating function. We take the IMDB average rating as the value used to sort the movies within each facet. This was already done in the preprocessing phase (see section 3.2.2), and the selection resumes by taking the first few movies from the facet. For example, in the following matrix one can see that if Adventure is a selected facet, the movies ”The Judy Garland Show” and ”The Secret of Monkey Island” will be selected as they have the highest IMDB rating within the facet. Fi,2 =      Adventure The Judy Garland Show The Secret of Monkey Island · · · 9, 8 9, 6 · · · ... ... Faceti Movie vector, ordered by relevancy      Now the selection of facets and movies are done, we can actually proceed to the creation of the Aspect Maps. 26
  28. 28. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.6 Creation of Aspect Maps The final step in the data-flow is the creation of the so-called Aspect Maps, a spatial rep- resentation of the selected facets. The maps allow the user to compare different facets and subsequently the related movies between themselves. We use maps to help the user envisage the locations of movies and facets in high dimensional space. Since it would be too difficult to visualize, this high dimensional space is reduced to two or three dimensions. For this of course we need a dimension reduction that is faithful to the distances in the original space. From the many techniques that are available (dimensionality reduction, ordination...) we selected multidimensional scaling (MDS). Figure 3.7: The transition from the facet selections to the aspect map. Multidimensional scaling is a special case of ordination. An MDS algorithm starts with a matrix of item-item similarities, then assigns a location to each item in N-dimensional space, where N is specified a priori. In our case, we want to reduce the matrix to 2 or 3 dimensions, to be able to visualize the result on a screen. The figure 3.7 shows the process of creating the aspect map in this step, it is quite straight- forward and all the data structures by now are ready to be consumed directly by an MDS algorithm. Figure 3.5 is actually a result of the MDS on a subset of the directors aspect and illustrates the output in this step. Sometimes people have suggested to use Self-Organizing Maps (SOM, [16]) to generate a lower dimensional representation. What we found that for this particular case SOM is prohibitively inefficient. By the end of this step, we have a collection of points in low dimensional space. Those points can be presented to the user in a number of different ways. Our approach is called the GridMap visualization and it is explained in the section 3.4.2. 27
  29. 29. 3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM 3.3 Server Technology From the beginning of the research, we wanted the system to be highly interactive and re- sponsive. In order achieve this we need a scalable system with high performance. For this, we determined the following requirements: • the data has to be sent very efficiently, potentially about 5-10 Kilobyte of text data on each user request; • the ability to notify user of events happening on the server; • real-time communication for the interaction, for instance, when user clicks on something, the system have to process the request in less than a second (or else, people simply won’t use it). Given the above requirements, the system should be based on an event-driven architecture (EDA) with compression and security. For completeness, here is the list of most distinctive features of the server (some readers may find it a bit technical): • Monolithic server, running on one machine, but potentially scalable to a cluster of machines. • Manages the thread pool and distributes the work to each thread. It would try to match the number of threads to cores (i.e.: 4 threads on a Quad Core machine) and distribute smaller tasks to those threads. • Big tasks are represented in a form of software timers, which are sliced in order to achieve scalability. • The server manages a socket pool, listening to several endpoints. Works with IPv4 and IPv6 as well. • Written in C#, the server is compatible with 32 and 64 bit platforms. It is also CLI- compliant and works on cross-platform frameworks like Mono (works on Unix, Linux...). • Handles client-socket lifetime, in order to achieve stability and error-tolerance. • Integrates Matlab interoperability layer, allowing the C# to communicate with Matlab and then send the results to the Flash client via network. • Handles the data via an object-relational mapping (ORM) layer. • Publish-Subscribe model is used for the real-time notifications. It allows clients to subscribe to an event of the server and be notified by the server when the event happens. This notification happens via a push-operation. • Custom message serialization/deserialization. • Per-packet compression/decompression. 28
  30. 30. 3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM • Through introspection, the server generates a networking libraries, compliant to a pro- tocol interface. Appendix A provides more information on this feature and illustrates some of the security and compression mechanisms. • Accounting, sessions mechanisms in order to keep track of users and their accounts and connections. • Access-Level security mechanism. All these features were actually implemented by ourselves, since at the time of our research not all of the technology was available to us. 29
  31. 31. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM 3.4 The Client Front-End 3.4.1 Overview Figure 3.8: The prototype of the client front-end The system we have described manipulates points in high dimensional space. This is not going to change. What we will add in this section is a way to present these points in a low dimensional space so that the user can interact with the system through direct manipulation in real time. For our prototype, we developed the visualization system, called GridMap. Section 3.4.2 explains how this system works and why. 30
  32. 32. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM 3.4.2 GridMap The reduction to the two dimensional space was already explained in the section of Aspect Maps (3.2.6). This was accomplished by multidimensional scaling. It is more important to know that a point is near another point than to know the exact distance. For example, as shown in figure 3.8, it is more important to know that Action is close to Adventure than to know the exact distance. Gridmap then, maps the points from 2D space calculated by MDS to a grid, where the exact distances disappear but the spatial order is retained. In the figure 3.8 9 cells are presented and the number of cells can be changed depending on the size of the screen (for example, during our experiments on 24 inch screen, the optimum GridMap size was 4 by 5, allowing to present easily more than a hundred of movies without overloading the user with information). Figure 3.9: This figure illustrates a mapping performed by the GridMap which removes the exact distances while leaving the order intact. The interface makes it easy for the user to zoom and filter: the left panel (as shown on figure 3.8) allows to switch between the aspect maps and filter and search on every facet. For example, when the user knows some particular actor he can search in the actors pane and then zoom and view the similar actors. Additionally, this panel allows the user to customize the zooming criteria and tune the MultiMap parameters: number of movies per facet, number of facets to show on the map, etc. 31
  33. 33. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Transitions It often happens that a person viewing a scene fails to see large changes in the scene. This is called change blindness, a well-known psychological phenomenon [20]: if the change in the scene coincides with some visual disruption such as a saccade (very small eye movements) or when the scene is briefly obscured. This situation often occurs in web applications, where the web page briefly flashes after actions demanding a new server request. In this context, animated transitions help the user see the changes in the scene [13] [21]. The transitions turned out to be quite important, providing visual feedback to the user so he know what’s going on. In the GridMap, there are two kinds of crucial transitions: • The transition that animates the facet pane, keeping it visible during the zooming on this pane, then moving it to a new position. This greatly helps to the user to keep track of the item he is zooming on to. This is needed, since on each zooming the coordinate system changes according to the zoom and can be quite confusing to the person who uses the interface. • The transition that is shown on the figure 3.10 which flips the grid cell, allowing the user to see the details of a particular movie within its context. This transition allows the user to directly see the information about the actual element he’s interested in, keeping everything in context. The user can always flip back and see other movie details. This is what people do in the video store where they look at available movies, pick one and flip it to see it’s details on the back. Figure 3.10: As in the video store, one can select the movie and look on the back of the box to see the details. 32
  34. 34. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Cell Representation The cell representation allows the flipping feature, illustrated in figure 3.10. Based on the feedback of users, this feature proved to be very attractive and motivated them to experiment further with the interface. Additionally, this allows to present movie details while keeping the other facets visible. The figure 3.11 shows how a list of movies is presented on a grid cell, giving a visual relevance feedback with a star. Golden stars are the best-rated movies and are probably most interesting for the user to check out. Figure 3.11: An actual list of movies presented on the front of the grid cell. The figure 3.12 illustrates the content presented when a movie is flipped: the movie cover, synopsis and two additional tabs. Figure 3.12: Details of the movie, first tab. It presents the synopsis available to the user to read in order to learn about a particular movie. The second tab, illustrated on figure 3.13, shows the related information of the movie, linking directly to different facets. By clicking on a particular genre, for example, the system will perform a zoom on the facet and construct a new map. It allows back and forth navigation: from big picture to details of one movie, then moving again on another map and zoom in again to a particular movie. 33
  35. 35. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Figure 3.13: Details of the movie, second tab. It presents the facet links to various information as year, rating and the genres of the movie. Figure 3.14: Details of the movie, third tab. It presents the facet links to the directors and actors. The last tab, illustrated on figure 3.14, allows the user to view the people: directors who made the movie and actors starring in the movie. Yet again, the system allows to directly zoom one one of those links, constructing a new map. 34
  36. 36. Chapter 4 Usability Aspects The main purpose of the work was to build a responsive system for a particular Rich Internet Application, in the area of exploratory search. Of course, such a system only makes sense if users can actually use it. So we did a, admittedly limited and informal, evaluation of its usability. To do so, I asked ten people, acquaintances and friends age 20-30 years old, half of each gender, to participate in a survey about my thesis work. The were explained what exploratory search was in general, without reference to the movie database. Next, they were asked to work with the system for about half an hour, and find movies of their liking. After working with the system they were asked to fill out a questionnaire with 25 questions. The questions are shown below and were about Usefulness, Ease of Use, Ease of Learning, and Satisfaction with the system. The questionnaires were constructed as seven-point Likert rating scales. Users were asked to rate agreement with the statements, raging from strongly disagree to strongly agree [17]. Following are the global averaged results of the questionnaire, per feature: Average results of USE questionnaire Average Usefulness: 5.3/7 Average Ease of Use: 5.6/7 Average Ease of Learning: 6.4/7 Average Satisfaction: 5.9/7 The users were very satisfied with the system and few of them also pointed out that the interface was very beautiful and user-friendly. On the other hand, some of them thought that the interface didn’t gave enough control to them in order to know exactly what happens underneath. For completeness of the section, here are the tables with averaged results: 35
  37. 37. CHAPTER 4. USABILITY ASPECTS Average results, Usefulness questionnaire It is useful 6.3/7 It gives me more control over the activities in my life 3.8/7 It makes the things I want to accomplish easier to get done 5.3/7 It meets my needs 5.7/7 It does everything I would expect it to do 5.3/7 Average results, Ease of Use questionnaire It is easy to use 5.8/7 It is user friendly 6.7/7 It requires the fewest steps possible to accomplish what I want to do with it 5.7/7 Using it is effortless 5.2/7 I can use it without written instructions 4.5/7 I don’t notice any inconsistencies as I use it 5.0/7 Both occasional and regular users would like it 6.2/7 I can recover from mistakes quickly and easily 5.8/7 I can use it successfully every time 5.5/7 Average results, Ease of Learning questionnaire I learned to use it quickly 6.5/7 I easily remember how to use it 6.5/7 It is easy to learn to use it 6.2/7 I quickly became skillful with it 6.3/7 Average results, Satisfaction questionnaire I am satisfied with it 6.0/7 I would recommend it to a friend 6.3/7 It is fun to use 6.7/7 It works the way I want it to work 6.2/7 It is wonderful 4.8/7 I feel I need to have it 4.8/7 It is pleasant to use 6.2/7 Those are preliminary results, but the more formal evaluation is beyond the scope of this thesis. 36
  38. 38. Chapter 5 Conclusions This thesis described a form of exploratory search where responsiveness was of the essence. The application we called ‘MultiMap’ can be categorized under the heading of so-called Rich Internet Applications, a class of applications that is becoming more and more important as data bases become larger, more specialized, and more distributed. Because of this, users more and more often get into a situation where they know there must be information available to answer their questions, nor are the means to formulate a precise query. The resources they need to answer such a query may be available on remote servers, hence to quickly explore possible answers, the servers much be made responsive enough or else the user will quickly give up. MultiMap was built with such users in mind. Every design decision in this thesis was under the constraint of responsiveness. This led to the following requirements: • The system should be responsive, scalable, and interactive. • The system should support exploratory search. • The system should provide real-time spatial visual feedback reflecting changes in the high-dimensional search space. Exploratory search is the problem to find information that we may not know how to formulate, but which we will recognize once we see it. There are three bottlenecks that could make our system unresponsive: (1) complex calculations, (2) slow zooming, and (3) ineffective visualization . The way we solved these bottlenecks are the following: 1. Every computation that can be done in advance will be done in advance, so that it cannot cause any delay. 2. Zooming and map generation are highly optimized and can be done in real-time. 3. The visualization is presented to the user in a cognitively appropriate way. We believe that such a system should be constructed in a modular fashion and in this thesis we presented a way to do so. This modularity allows, for example, to change the ranking or 37
  39. 39. CHAPTER 5. CONCLUSIONS enhance the selection algorithms and be able to evaluate the new algorithm performance based on the existing one. It also allows to build various user-interfaces on top of the search engine and eventually audience-targeted user interfaces. During the research we discussed several different possible front ends, including different 2-dimensional representations enhanced with colors, sounds, font sizes. Also, 3-dimensional interfaces can be built and are very interesting directions to explore. We considered implementing 3-dimensional sphere navigation where the zooming could allow to create 2D map or a new 3D sphere, but we leave that for future work. The third question (about usability) was answered by evaluation and feedback we got from users. The users were very satisfied with both MultiMap and GridMap, but also felt that they had not enough control over the system. They quickly learned how to use the system and how to get movie suggestions. However, here was a need to explain and introduce them to the concept at first, as it is a different approach to information exploration. After having designed and evaluated the system, we believe that the map generation technique presented in this thesis is an important direction to go and an effective way to perform exploratory search. 38
  40. 40. Bibliography [1] International movie database, http://www.imdb.com, December 2009. [2] Rfc 2616: Hypertext transfer protocol – http/1.1, http://tools.ietf.org/html/rfc2616, June 1999. [3] Http wikipedia, http : //en.wikipedia.org/wiki/hypertexttransferprotocol, June 2010. [4] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17:734-749, 6 2005. [5] G. Armitage. Quality of service in ip networks: Foundations for a multi-service internet. Macmillan Technical Publishing, 4 2000. [6] G. Armitage, M. Claypool, and P. Branch. Networking and Online Games: Understand- ing and Engineering Multiplayer Internet Games. John Wiley and Sons Ltd., 2006. [7] R.M. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollars programming prize. IEEE Spectrum, 5 2009. [8] S. Caltagirone, M. Keys, B. Schlief, and M. J. Willshire. Architecture for a massively multiplayer online role playing game engine. Journal of Computing Sciences in Colleges, Volume 18, Issue 2, 12 2002. [9] Piero Fraternali, Gustavo Rossi, and Fernando S andnchez Figueroa. Rich internet ap- plications. Internet Computing, IEEE, 14(3):9 –12, may-june 2010. [10] J. Gregory. Game Engine Architecture. A K Peters, 2009. [11] M. A. Hearst. Next generation web search: Setting our sites. IEEE Data Engineering Buletin 23, 3, 38-48, 3 2000. [12] M. A. Hearst. Design recommendations for hierarchical faceted search interfaces. SIGIR, Workshop on Faceted Search, pages 2630, August 2006. pages 2630, August 2006, 2006. [13] J. Heer and G. Robertson. Animated transitions in statistical data graphics. IEEE Transactions on Visualization and Computer Graphics, 6 2007. [14] J. F. Kurose and K. W. Ross. Computer Networking A Top-Down Approach. Pearson Education Inc., 2008. 39
  41. 41. BIBLIOGRAPHY BIBLIOGRAPHY [15] X. Lin. Map displays for information retrieval. Journal of the Americal Society for Information Science, 1 1997. [16] X. Lin, D. Soergel, and G. Marchionini. A self-organizing semantic map for informa- tion retrieval. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval. 262 - 269, 1991. [17] A.M. Lund. Measuring usability with the use questionnaire. STC Usability SIG Newslet- ter, 8:2, 8 2001. [18] J. Makar. ActionScript for Multiplayer Games and Virtual Worlds. New Riders, 2010. [19] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM 49, 4 2006. [20] J. ORegan, R. Rensink, and J. Clark. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8 1997. [21] G. M. Sacco and Y. Tzitzikas. Dynamic Taxonomies and Faceted Search: Theory, Prac- tice, and Experience. Springer Science and Business Media Inc., 2009. [22] J. Smed and H. Hakonen. Algorithms and Networking for Computer Games. John Wiley and Sons Ltd, 2006. [23] M. Steyvers. Multidimensional Scaling. In: Encyclopedia of Cognitive Science. Macmillan Reference Ltd., 2002. [24] D. Svanaes. Understanding Interactivity: Steps to a Phenomenology of Human-Computer Interaction. PhD Thesis. NTNU, Trondheim, Norway, 2000. [25] A. G Taylor. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado. Libraries Unlimited, 1992. [26] B.C Vickery. Faceted classification: a guide to construction and use of special schemes. London: Aslib, 1960. [27] R.W. White, B. Kules, S.M. Drucker, and M.C. Schraefel. Supporting exploratory search. Communications of the ACM, 49, 4 2006. 40
  42. 42. Appendix A Protocol Generation DSL Since I had to do all the programming for the research project myself, the workload was quite demanding. In order to avoid writing individual implementations for each networking method or protocol, the protocol generation mechanism has been implemented. To explain how it works, consider the following C# code: Listing A.1: A partial definition of the MultiMap protocol [ Protocol ] public interface IMultiMapProtocol { // Gets a l l aspects in the system [ ProtocolOperation (100 , Direction . Pull , CompressionTarget . Outgoing ) ] Aspect [ ] GetAllAspects ( ) ; // Zooms to a p a r t i c u l a r s e l e c t i o n [ ProtocolOperation (106 , Direction . Pull , CompressionTarget . Incoming ) ] void Zoom( Aspect Aspect , List<int> Facets ) ; // Gets some a d d i t i o n a l information of a movie [ ProtocolOperation (112 , Direction . Pull , CompressionTarget . Outgoing , AccessLevel=AccessLevel . Root ) ] MovieDetails GetMovieDetails ( int Oid ) ; ( . . . ) } Figure A.1 illustrates the code one needs to write in order to define a communication protocol. Such approach can be also considered as a domain-specific language (DSL). Once the protocol definition is written, the server analyses the protocol definition and generates the code to make all the communication possible. It generates an assembly for its own and a flash component library (.swc) for flash application, thus, making possible to simply call any method and abstracting the complexity from the developer. Our research greatly benefit from this DSL, as several thousands of lines of code could be generated eliminating potential errors and boosting productivity. 41
  43. 43. APPENDIX A. PROTOCOL GENERATION DSL Using the protocol definition it is also possible to define the compression direction (None, Incoming, Outgoing or Both), which will generate the subsequent function calls during the packet compilation/read. It is also possible to define the security level per operation, using AccessLevel parameter (shown in figure A.1). 42

×