Aardvark Final Www2010


Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Aardvark Final Www2010

  1. 1. The Anatomy of a Large-Scale Social Search Engine Damon Horowitz Sepandar D. Kamvar Aardvark Stanford University damon@aardvarkteam.com sdkamvar@stanford.edu ABSTRACT let them watch TV.” is better answered by a friend than the We present Aardvark, a social search engine. With Aard- library. These differences in information retrieval paradigm vark, users ask a question, either by instant message, email, require that a social search engine have very different archi- web input, text message, or voice. Aardvark then routes the tecture, algorithms, and user interfaces than a search engine question to the person in the user’s extended social network based on the library paradigm. most likely to be able to answer that question. As compared The fact that the library and the village paradigms of to a traditional web search engine, where the challenge lies knowledge acquisition complement one another nicely in the in finding the right document to satisfy a user’s information offline world suggests a broad opportunity on the web for need, the challenge in a social search engine like Aardvark social information retrieval. lies in finding the right person to satisfy a user’s information 1.2 Aardvark need. Further, while trust in a traditional search engine is based on authority, in a social search engine like Aardvark, In this paper, we present Aardvark, a social search engine trust is based on intimacy. We describe how these considera- based on the village paradigm. We describe in detail the ar- tions inform the architecture, algorithms, and user interface chitecture, ranking algorithms, and user interfaces in Aard- of Aardvark, and how they are reflected in the behavior of vark, and the design considerations that motivated them. Aardvark users. We believe this to be useful to the research community for two reasons. First, the argument made in the original Anatomy paper [3] still holds true — since most search en- 1. INTRODUCTION gine development is done in industry rather than academia, the research literature describing end-to-end search engine 1.1 The Library and the Village architecture is sparse. Second, the shift in paradigm opens Traditionally, the basic paradigm in information retrieval up a number of interesting research questions in information has been the library. Indeed, the field of IR has roots in retrieval, for example around human expertise classification, the library sciences, and Google itself came out of the Stan- implicit network construction, and conversation design. ford Digital Library project [13]. While this paradigm has Following the architecture description, we present a statis- clearly worked well in several contexts, it ignores another tical analysis of usage patterns in Aardvark. We find that, as age-old model for knowledge acquisition, which we shall call compared to traditional search, Aardvark queries tend to be “the village paradigm”. In a village, knowledge dissemi- long, highly contextualized and subjective — in short, they nation is achieved socially — information is passed from tend to be the types of queries that are not well-serviced by person to person, and the retrieval task consists of finding traditional search engines. We also find that the vast ma- the right person, rather than the right document, to answer jority of questions get answered promptly and satisfactorily, your question. and that users are surprisingly active, both in asking and The differences how people find information in a library answering. versus a village suggest some useful principles for designing Finally, we present example results from the current Aard- a social search engine. In a library, people use keywords to vark system, and a comparative evaluation experiment. What search, the knowledge base is created by a small number of we find is that Aardvark performs very well on queries that content publishers before the questions are asked, and trust deal with opinion, advice, experience, or recommendations, is based on authority. In a village, by contrast, people use while traditional corpus-based search engines remain a good natural language to ask questions, answers are generated in choice for queries that are factual or navigational and lack real-time by anyone in the community, and trust is based a subjective element. on intimacy. These properties have cascading effects — for example, real-time responses from socially proximal respon- 2. OVERVIEW ders tend to elicit (and work well for) highly contextualized and subjective queries. For example, the query “Do you 2.1 Main Components have any good babysitter recommendations in Palo Alto for The main components of Aardvark are: my 6-year-old twins? I’m looking for somebody that won’t 1. Crawler and Indexer. To find and label resources that Copyright is held by the author/owner(s). WWW2010, April 26-30, 2010, Raleigh, North Carolina. contain information — in this case, users, not docu- . ments (Sections 3.2 and 3.3).
  2. 2. 2. Query Analyzer. To understand the user’s information Question Analyzer need (Section 3.4). classifiers Gateways 3. Ranking Function. To select the best resources to pro- IM Transport Email Msgs RSR vide the information (Section 3.5). Conversation Manager Layer Twitter Routing iPhone Engine Web {RS}* 4. UI. To present the information to the user in an ac- SMS cessible and interactive form (Section 3.6). Most corpus-based search engines have similar key com- Index ponents with similar aims [3], but the means of achieving Importers Database Web Content Users those aims are quite different. Facebook Graph Before discussing the anatomy of Aardvark in depth, it is LinkedIn Topics useful to describe what happens behind the scenes when a Etc. Content new user joins Aardvark and when a user asks a question. 2.2 The Initiation of a User Figure 1: Schematic of the architecture of Aardvark When a new user first joins Aardvark, the Aardvark sys- tem performs a number of indexing steps in order to be able to direct the appropriate questions to her for answering. created, the user is now active on the system and ready to Because questions in Aardvark are routed to the user’s ex- ask her first question. tended network, the first step involves indexing friendship and affiliation information. The data structure responsible 2.3 The Life of a Query for this is the Social Graph. Aardvark’s aim is not to build a social network, but rather to allow people to make use of A user can access Aardvark through many different means, their existing social networks. As such, in the sign-up pro- the most common being instant message (on the desktop) cess, a new user has the option of connecting to a social net- or text message (on the mobile phone). A user begins by work such as Facebook or LinkedIn, importing their contact asking a question. The question gets sent from the input lists from a webmail program, or manually inviting friends device to the Transport Layer, where it is normalized to a to join. Additionally, anybody whom the user invites to Message data structure, and sent to the Conversation Man- join Aardvark is appended to their Social Graph – and such ager. Once the Conversation Manager determines that the invitations are a major source of new users. Finally, Aard- message is a question, it sends the question to the Ques- vark users are connected through common “groups” which tion Analyzer to determine the appropriate classifications reflect real-world affiliations they have, such as the schools and topics for the question. The Conversation Manager in- they have attended and the companies they have worked forms the asker which primary topic was determined for the at; these groups can be imported automatically from social question, and gives the asker the opportunity to edit it. It networks, or manually created by users. Aardvark indexes simultaneously issues a Routing Suggestion Request to the this information and stores it in the Social Graph, which is Routing Engine. The routing engine plays a role analogous a fixed width ISAM index sorted by userId. to the ranking function in a corpus-based search engine. It Simultaneously, Aardvark indexes the topics about which accesses the Inverted Index and Social Graph for a list of the new user has some level of knowledge or experience. candidate answerers, and then ranks them to reflect how This topical expertise can be garnered from several sources: well it believes they can answer the question, and how good a user can indicate topics in which he believes himself to of a match they are for the asker. The Routing Engine then have expertise; a user’s friends can indicate which topics returns a ranked list of Routing Suggestions to the Conver- they trust the user’s opinions about; a user can specify an sation Manager. The Conversation Manager contacts the existing structured profile page from which the Topic Parser potential answerers — one by one, or a few at a time, de- parses additional topics; a user can specify an account on pending upon a Routing Policy — and asks them if they which they regularly post status updates (e.g., Twitter or would like to answer the question, until a satisfactory an- Facebook), from which the Topic Extractor extracts top- swer is found. The Conversation Manager then forwards this ics (from unstructured text) in an ongoing basis (see Sec- answer along to the asker, and allows the asker and answerer tion 3.3 for more discussion); and finally, Aardvark observes to exchange further follow-up messages if they choose. the user’s behavior on Aardvark, in answering (or electing not to answer) questions about particular topics. The set of topics associated with a user is recorded in 3. ANATOMY the Forward Index, which stores each userId, a scored list of topics, and a series of further scores about a user’s behavior 3.1 The Model (e.g., pertaining to their responsiveness and the quality of The core of Aardvark is a statistical model for routing their answers). questions to potential answerers. We use a network variant From the Forward Index, Aardvark constructs an Inverted of what has been called an aspect model [8], that has two Index. The Inverted Index stores each topicId and a scored primary features. First, it associates an unobserved class list of userIds that have expertise in that topic. In addition variable t ∈ T = t1 , . . . , tn with each observation (i.e., the to topics, the Inverted Index stores scored lists of userIds successful answer of question q by user ui ). In other words, for features like answer quality and response time. the probability p(ui |q) that user i will successfully answer Once the Inverted Index and Social Graph for a user are question q depends on whether q is about the topics t in
  3. 3. which ui has expertise: This suggests that the strategy for increasing the knowl- X edge base of Aardvark crucially involves creating a good p(ui |q) = p(ui |t)p(t|q) (1) experience for users so that they remain active and are in- t∈T clined to invite their friends. An extended discussion of this is outside of the scope of this paper; we mention it here only The second main feature of the model is that it defines a to emphasize the difference in the nature of “crawling” in query-independent probability of success for each potential social search versus traditional search. asker/answerer pair (ui , uj ), based upon their degree of so- Given a set of active users on Aardvark, the effective cial connectedness and profile similarity. In other words, we breadth of the Aardvark knowledge base depends upon de- define a probability p(ui |uj ) that user ui will deliver a sat- signing interfaces and algorithms that can collect and learn isfying answer to user uj , regardless of the question. This an extended topic list for each user over time, as discussed is reminiscent of earlier work in search in P2P Networks, in the next section. where queries have been routed over relationship-based over- lay networks [4]. 3.3 Indexing People We then define the scoring function s(ui , uj , q) as the com- The central technical challenge in Aardvark is selecting position of the two probabilities. the right user to answer a given question from another user. In order to do this, the two main things Aardvark needs X s(ui , uj , q) = p(ui |uj ) · p(ui |q) = p(ui |uj ) p(ui |t)p(t|q) t∈T to learn about each user ui are: (1) the topics t he might (2) be able to answer questions about psmoothed (t|ui ); (2) the Our goal in the ranking problem is: given a question q users uj to whom he is connected p(ui |uj ). from user uj , return a ranked list of users ui ∈ U that Topics. Aardvark computes the distribution p(t|ui ) of maximizes s(ui , uj , q). topics known by user ui from the following sources of infor- Note that the scoring function is composed of a query- mation: dependent relevance score p(ui |q) and a query-independent • Users are prompted to provide at least three topics quality score p(ui |uj ). This bears similarity to the ranking which they believe they have expertise about. functions of traditional corpus-based search engines such as Google [3]. The difference is that unlike quality scores like • Friends of a user (and the person who invited a user) PageRank [13], Aardvark’s quality score aims to measure are encouraged to provide a few topics that they trust intimacy rather than authority. And unlike the relevance the user’s opinion about. scores in corpus-based search engines, Aardvark’s relevance score aims to measure a user’s potential to answer a query, • Aardvark parses out topics from users’ existing online rather than a document’s existing capability to answer a profiles (e.g., Facebook profile pages, if provided). For query. such pages with a known structure, a simple Topic Computationally, this scoring function has a number of Parsing algorithm uses regular expressions which were advantages. It allows real-time routing because it pushes manually devised for specific fields in the pages, based much of the computation offline. The only component prob- upon their performance on test data. ability that needs to be computed at query time is p(t|q). Computing p(t|q) is equivalent to assigning topics to a ques- • Aardvark automatically extracts topics from unstruc- tion — in Aardvark we do this by running a probabilistic tured text on users’ existing online homepages or blogs classifier on the question at query time (see Section 3.4). if provided. For unstructured text, a linear SVM iden- The distribution p(ui |t) assigns users to topics, and the dis- tifies the general subject area of the text, while an tribution p(ui |uj ) defines the Aardvark Social Graph. Both ad-hoc named entity extractor is run to extract more of these are computed by the Indexer at signup time, and specific topics, scaled by a variant tf-idf score. then updated continuously in the background as users an- • Aardvark automatically extracts topics from users’ sta- swer questions and get feedback (see Section 3.3). The com- tus message updates (e.g., Twitter messages, Facebook ponent multiplications and sorting are also done at query news feed items, IM status messages, etc.) and from time, but these are easily parallelizable, as the index is the messages they send to other users on Aardvark. sharded by user. The motivation for using these latter sources of profile 3.2 Social Crawling topic information is a simple one: if you want to be able A comprehensive knowledge base is important for search to predict what kind of content a user will generate (i.e., engines as query distributions tend to have a long tail [9]. In p(t|ui )), first examine the content they have generated in corpus-based search engines, this is achieved by large-scale the past. In this spirit, Aardvark uses web content not as a crawlers and thoughtful crawl policies. In Aardvark, the source of existing answers about a topic, but rather, as an knowledge base consists of people rather than documents, so indicator of the topics about which a user is likely able to the methods for acquiring and expanding a comprehensive give new answers on demand. knowledge base are quite different. In essence, this involves modeling a user as a content- With Aardvark, the more active users there are, the more generator, with probabilities indicating the likelihood she potential answerers there are, and therefore the more com- will likely respond to questions about given topics. Each prehensive the coverage. More importantly, because Aard- topic in a user profile has an associated score, depending vark looks for answerers primarily within a user’s extended upon the confidence appropriate to the source of the topic. social network, the denser the network, the larger the effec- In addition, Aardvark learns over time which topics not to tive knowledge base. send a user questions about by keeping track of cases when
  4. 4. the user: (1) explicitly “mutes” a topic; (2) declines to an- 3.4 Analyzing Questions swer questions about a topic when given the opportunity; The purpose of the Question Analyzer is to determine a (3) receives negative feedback on his answer about the topic scored list of topics p(t|q) for each question q representing from another user. the semantic subject matter of the question. This is the only Periodically, Aardvark will run a topic strengthening algo- probability distribution in equation 2 that is computed at rithm, the essential idea of which is: if a user has expertise query time. in a topic and most of his friends also have some exper- It is important to note that in a social search system, the tise in that topic, we have more confidence in that user’s requirement for a Question Analyzer is only to be able to level of expertise than if he were alone in his group with understand the query sufficiently for routing it to a likely knowledge in that area. Mathematically, for some user ui , answerer. This is a considerably simpler task than the chal- his group of friends U , and some topic t, if p(t|ui ) = 0, P lenge facing an ideal web search engine, which must attempt then s(t|ui ) = p(t|ui ) + γ u∈U p(t|u), where γ is a small to determine exactly what piece of information the user is constant. The s values are then renormalized to form prob- seeking (i.e., given that the searcher must translate her infor- abilities. mation need into search keywords), and to evaluate whether Aardvark then runs two smoothing algorithms the pur- a given web page contains that piece of information. By pose of which are to record the possibility that the user may contrast, in a social search system, it is the human answerer be able to answer questions about additional topics not ex- who has the responsibility for determining the relevance of plicitly recorded in her profile. The first uses basic collabo- an answer to a quesion — and that is a function which hu- rative filtering techniques on topics (i.e., based on users with man intelligence is extremely well-suited to perform! The similar topics), the second uses semantic similarity1 . asker can express his information need in natural language, Once all of these bootstrap, extraction, and smoothing and the human answerer can simply use her natural un- methods are applied, we have a list of topics and scores derstanding of the language of the question, of its tone of for a given user. Normalizing these topic scores so that P voice, sense of urgency, sophistication or formality, and so t∈T p(t|ui ) = 1, we have a probability distribution for forth, to determine what information is suitable to include topics known by user ui . Using Bayes’ Law, we compute for in a response. Thus, the role of the Question Analyzer in each topic and user: a social search system is simply to learn enough about the p(t|ui )p(ui ) question that it may be sent to appropriately interested and p(ui |t) = , (3) knowledgeable human answerers. p(t) As a first step, the following classifiers are run on each using a uniform distribution for p(ui ) and observed topic question: A NonQuestionClassifier determines if the input frequencies for p(t). is not actually a question (e.g., is it a misdirected message, a Aardvark collects these probabilities p(ui |t) indexed by sequence of keywords, etc.); if so, the user is asked to submit topic into the Inverted Index, which allows for easy lookup a new question. An InappropriateQuestionClassifier deter- when a question comes in. mines if the input is obscene, commercial spam, or otherwise Connections. Aardvark computes the connectedness be- inappropriate content for a public question-answering com- tween users p(ui |uj ) in a number of ways. While social munity; if so, the user is warned and asked to submit a new proximity is very important here, we also take into account question. A TrivialQuestionClassifier determines if the in- similarities in demographics and behavior. The factors con- put is a simple factual question which can be easily answered sidered here include: by existing common services (e.g., “What time is it now?”, • Social connection (common friends and affiliations) “What is the weather?”, etc.); if so, the user is offered an automatically generated answer resulting from traditional • Demographic similarity web search. A LocationSensitiveClassifier determines if the • Profile similarity (e.g., common favorite movies) input is a question which requires knowledge of a particular location, usually in addition to specific topical knowledge • Vocabulary match (e.g., IM shortcuts) (e.g., “What’s a great sushi restaurant in Austin, TX?”); if • Chattiness match (frequency of follow-up messages) so, the relevant location is determined and passed along to the Routing Engine with the question. • Verbosity match (the average length of messages) Next, the list of topics relevant to a question is produced by merging the output of several distinct TopicMapper algo- • Politeness match (e.g., use of “Thanks!”) rithms, each of which suggests its own scored list of topics: • Speed match (responsiveness to other users) • A KeywordMatchTopicMapper passes any terms in the Connection strengths between people are computed using a question which are string matches with user profile weighted cosine similarity over this feature set, normalized P topics through a classifier which is trained to deter- so that ui ∈U p(ui |uj ) = 1, and stored in the Social Graph mine whether a given match is likely to be semantically for quick access at query time. significant or misleading.2 Both the distributions p(ui |uj ) in the Social Graph and p(t|ui ) in the Inverted Index are continuously updated as 2 users interact with one another on Aardvark. For example, if the string “camel wrestling” occurs in a question, it is likely to be semantically relevant to a user 1 In both the Person Indexing and the Question Analysis who has “camel wrestling” as a profile topic; whereas the components, “semantic similarity” is computed by using string “running” is too ambiguous to use in this manner an approximation of distributional similarity computed over without further validation, since it might errantly route a Wikipedia and other corpora; this serves as a proxy measure question about “running a business” to a user who knows of the topics’ semantic relatedness. about fitness.
  5. 5. • A TaxonomyTopicMapper classifies the question text sense of connection and similarity, and meet each other’s into a taxonomy of roughly 3000 popular question top- expectations for conversational behavior in the interaction. ics, using an SVM trained on an annotated corpus of Availability: Third, the Routing Engine prioritizes can- several millions questions. didate answerers in such a way so as to optimize the chances that the present question will be answered, while also pre- • A SalientTermTopicMapper extracts salient phrases from serving the available set of answerers (i.e., the quantity of the question — using a noun-phrase chunker and a tf- “answering resource” in the system) as much as possible by idf–based measure of importance — and finds seman- spreading out the answering load across the user base. This tically similar user topics. involves factors such as prioritizing users who are currently online (e.g., via IM presence data, iPhone usage, etc.), who • A UserTagTopicMapper takes any user “tags” provided are historically active at the present time-of-day, and who by the asker (or by any would-be answerers), and maps have not been contacted recently with a request to answer these to semantically-similar user topics.3 a question. Given this ordered list of candidate answerers, the Rout- At present, the output distributions of these classifiers ing Engine then filters out users who should not be con- are combined by weighted linear combination. It would be tacted, according to Aardvark’s guidelines for preserving a interesting future work to explore other means of combin- high-quality user experience. These filters operate largely ing heterogeneous classifiers, such as the maximum entropy as a set of rules: do not contact users who prefer to not be model described in [11]. contacted at the present time of day; do not contact users The Aardvark TopicMapper algorithms are continuously who have recently been contacted as many times as their evaluated by manual scoring on random samples of 1000 contact frequency settings permit; etc. questions. The topics used for selecting candidate answerers, Since this is all done at query time, and the set of candi- as well as a much larger list of possibly relevant topics, are date answerers can potentially be very large, it is useful to assigned scores by two human judges, with a third judge note that this process is parallelizable. Each shard in the adjudicating disagreements. For the current algorithms on Index computes its own ranking for the users in that shard, the current sample of questions, this process yields overall and sends the top users to the Routing Engine. This is scal- scores of 89% precision and 84% recall of relevant topics. able as the user base grows, since as more users are added, In other words, 9 out of 10 times, Aardvark will be able more shards can be added. to route a question to someone with relevant topics in her The list of candidate answerers who survive this filtering profile; and Aardvark will identify 5 out of every 6 possibly process are returned to the Conversation Manager. The relevant answerers for each question based upon their topics. Conversation Manager then proceeds with opening channels 3.5 The Aardvark Ranking Algorithm to each of them, serially, inquiring whether they would like to answer the present question; and iterating until an answer Ranking in Aardvark is done by the Routing Engine, which is provided and returned to the asker. determines an ordered list of users (or “candidate answer- ers”) who should be contacted to answer a question, given 3.6 User Interface the asker of the question and the information about the ques- Since social search is modeled after the real-world process tion derived by the Question Analyzer. The core ranking of asking questions to friends, the various user interfaces for function is described by equation 2; essentially, the Routing Aardvark are built on top of the existing communication Engine can be seen as computing equation 2 for all candidate channels that people use to ask questions to their friends: answerers, sorting, and doing some postprocessing. IM, email, SMS, iPhone, Twitter, and Web-based messag- The main factors that determine this ranking of users are ing. Experiments were also done using actual voice input Topic Expertise p(ui |q), Connectedness p(ui |uj ), and Avail- from phones, but this is not live in the current Aardvark ability: production system. Topic Expertise: First, the Routing Engine finds the In its simplest form, the user interface for asking a ques- subset of users who are semantic matches to the question: those users whose profile topics indicate expertise relevant to the topics which the question is about. Users whose profile topics are closer matches to the question’s topics are given higher rank. For questions which are location-sensitive (as defined above), only users with matching locations in their profiles are considered. Connectedness: Second, the Routing Engine scores each user according to the degree to which she herself — as a person, independently of her topical expertise — is a good “match” for the asker for this information query. The goal of this scoring is to optimize the degree to which the asker and the answerer feel kinship and trust, arising from their 3 A general principle in the design of Aardvark is to use hu- man intelligence wherever possible to improve the quality of the system. For the present task of Question Analysis, this involves giving both askers and answerers prompts and sim- ple commands for telling Aardvark directly what the subject matter of a question is. Figure 2: Screenshot of Aardvark over IM
  6. 6. aardvark: You there? I have a question about **tennis** that I think you might be able to answer. (Type 'sure', 'pass'‚ or 'busy'.) me: sure aardvark: (From Michael B./29/M/NewYork,US) Does anyone know which health clubs in Manhattan have tennis courts? Or are all the good options in the other boroughs? (Type 'pass' to skip, or 'more' for extra options.) me: why aardvark: I thought of you for Michael's question because - Michael is a friend of your friend Sally Johnson. - You know a lot about **racquet sports** and **fitness** - You and he have similar interests in **sports** (Type 'more' or change settings at http://vark.com/a/XXXXX) me: Well there is always the Midtown Tennis Club on 8th ave @27th if you really want to stay in manhattan -- but the quality isn't great. You'd do just as well to use the public courts in Central Park. Or another good option is to join NYHRC or NYSC in manhattan, and use their courts in other boroughs... aardvark: Great -- I've sent that to Michael. Thanks for the fast answer! (Type 'Michael:' followed by a message to add something, or 'more' for options.) Figure 4: Screenshot of Aardvark Answering Tab on Figure 3: Example of Aardvark interacting with an iPhone answerer quite unusual and innovative behavior for a search engine, tion on Aardvark is any kind of text input mechanism, along and it is a key to Aardvark’s ability to find answers to a wide with a mechanism for displaying textual messages returned range of questions: the available set of potential answerers from Aardvark. (This kind of very lightweight interface is is not just whatever users happen to be visiting a bulletin important for making the search service available anywhere, board at the time a question is posted, but rather, the entire especially now that mobile device usage is ubiquitous across set of users that Aardvark has contact information for. Be- most of the globe.) cause this kind of “reaching out” to users has the potential However, Aardvark is most powerful when used through to become an unwelcome interruption if it happens too fre- a chat-like interface that enables ongoing conversational in- quently, Aardvark sends such requests for answers usually teraction. A private 1-to-1 conversation creates an intimacy less than once a day to a given user (and users can easily which encourages both honesty and freedom within the con- change their contact settings, specifying prefered frequency straints of real-world social norms. (By contrast, answering and time-of-day for such requests). Further, users can ask forums where there is a public audience can both inhibit Aardvark “why” they were selected for a particular ques- potential answerers [12] or or motivate public performance tion, and be given the option to easily change their profile rather than authentic answering behavior [17].) Further, if they do not want such questions in the future. This is in a real-time conversation, it is possible for an answerer very much like the real-world model of social information to request clarifying information from the asker about her sharing: the person asking a question, or the intermediary question, or for the asker to follow-up with further reactions in Aardvark’s role, is careful not to impose too much upon or inquiries to the answerer. a possible answerer. The ability to reach out to an extended There are two main interaction flows available in Aard- network beyond a user’s immediate friendships, without im- vark for answering a question. The primary flow involves posing too frequently on that network, provides a key differ- Aardvark sending a user a message (over IM, email, etc.), entiating experience from simply posting questions to one’s asking if she would like to answer a question: for exam- Twitter or Facebook status message. ple, “You there? A friend from the Stanford group has a A secondary flow of answering questions is more similar to question about *search engine optimization* that I think traditional bulletin-board style interactions: a user sends a you might be able to answer.”. If the user responds affir- message to Aardvark (e.g., “try”) or visits the “Answering” matively, Aardvark relays the question as well as the name tab of Aardvark website or iPhone application (Figure 4), of the questioner. The user may then simply type an an- and Aardvark shows the user a recent question from her swer the question, type in a friend’s name or email address network which has not yet been answered and is related to to refer it to someone else who might answer, or simply her profile topics. This mode involves the user initiating “pass” on this request.4 While this may seem like an en- the exchange when she is in the mood to try to answer a tirely quotidian interaction between friends, it is actually question; as such, it has the benefit of an eager potential 4 answerer – but as the only mode of answering it does not There is no shame in “passing” on a question, since nobody effectively tap into the full diversity of the user base (since else knows that the question was sent to you. Similarly, there is no social cost to the user in asking a question, since most users do not initiate these episodes). This is an im- (unlike in the real-world) you are not directly imposing on a portant point: while almost everyone is happy to answer friend or requesting a favor; rather, Aardvark plays the role questions (see Section 5) to help their friends or people they of the intermediary who bears this social cost. are connected to, not everyone goes out of their way to do so.
  7. 7. "What fun bars downtown have outdoor EXAMPLE 1 EXAMPLE 2 seating?" "I've started running at least 4 days each (Question from Mark C./M/LosAltos,CA) (Question from James R./M/ "I'm just getting into photography. Any week, but I'm starting to get some knee and I am looking for a restaurant in San TwinPeaksWest,SF) suggestions for a digital camera that would ankle pain. Any ideas about how to address Francisco that is open for lunch. Must be What is the best new restaurant in San be easy enough for me to use as a this short of running less?" very high-end and fancy (this is for a small, Francisco for a Monday business dinner? beginner, but I'll want to keep using for a formal, post-wedding gathering of about 8 Fish & Farm? Gitane? Quince (a little older)? while?" "I need a good prank to play on my people). (+7 minutes -- Answer from Paul D./M/ supervisor. She has a good sense of (+4 minutes -- Answer from Nick T./28/M/ SanFrancisco,CA -- A friend of your friend "I'm going to Berlin for two weeks and SanFrancisco,CA -- a friend of your friend humor, but is overly professional. Any Sebastian V.) would like to take some day trips to places Fritz Schwartz) ideas?" For business dinner I enjoyed Kokkari that aren't too touristy. Where should I go?" fringale (fringalesf.com) in soma is a good Estiatorio at 200 Jackson. If you prefer a "I just moved and have the perfect spot for bet; small, fancy, french (the french actually place in SOMA i recommend Ozumo (a great "My friend's in town and wants to see live hang out there too). Lunch: Tuesday - music. We both love bands like the a plant in my living room. It gets a lot of sushi restaurant). Friday: 11:30am - 2:30pm Counting Crows. Any recommendations for light from the north and south, but I know I (Reply from James to Paul) won't be too reliable with watering. Any (Reply from Mark to Nick) thx I like them both a lot but I am ready to try shows (of any size) to check out?" suggestions for plants that won't die?" Thanks Nick, you are the best PM ever! something new "Is there any way to recover an unsaved (Reply from Nick to Mark) (+1 hour -- Answer from Fred M./29/M/ Excel file that was closed manually on a "Should I wear brown or black shoes with a you're very welcome. hope the days they're Marina,SF) Mac?" light brown suit?" open for lunch work... Quince is a little fancy... La Mar is pretty "I'm putting together a focus group to talk fantastic for cevice - like the Slanted Door of "I need to analyze a Spanish poem for EXAMPLE 3 peruvian food... about my brand new website. Any tips on making it as effective as possible?" class. What are some interesting Spanish poems that aren't too difficult to translate?" (Question from Brian T./22/M/Castro,SF) What is a good place to take a spunky, off-the-cuff, social, and pretty girl for a nontraditional, fun, memorable dinner date in San Francisco? "I'm making cookies but ran out of baking powder. Is there anything I can substitute?" "I always drive by men selling strawberries (+4 minutes -- Answer from Dan G./M/SanFrancisco,CA) on Stanford Ave. How much do they charge Start with drinks at NocNoc (cheap, beer/wine only) and then dinner at RNM (expensive, "I have a job interview over lunch tomorrow. per flat?" across the street). Is there any interview restaurant etiquette (Reply from Brian to Dan) Thanks! that I should know?" "My girlfriend's ex bought her lots of (+6 minutes -- Answer from Anthony D./M/Sunnyvale,CA -- you are both in the Google group) expensive presents on anniversaries. I'm Take her to the ROTL production of Tommy, in the Mission. Best show i've seen all year! "I want to give my friend something that pretty broke, but want to show her that I lasts as a graduation present, but someone care. Any ideas for things I could do that (Reply from Brian to Anthony) Tommy as in the Who's rock opera? COOL! already gave her jewelry. What else could I are not too cliche?" (+10 minutes -- Answer from Bob F./M/Mission,SF -- you are connected through Mathias' friend give her?" Samantha S.) Cool question. Spork is usually my top choice for a first date, because in addition to having great food and good really friendly service, it has an atmosphere that's perfectly in between casual and romantic. It's a quirky place, interesting funny menu, but not exactly non- Figure 5: A random set of queries from Aardvark traditional in the sense that you're not eating while suspended from the ceiling or anything Figure 6: Three complete Aardvark interactions This willingness to be helpful persists because when users do answer questions, they report that it is a very gratifying experience: they have been selected by Aardvark because behave in the same ways they would in a comparable real- of their expertise, they were able to help someone who had world situation of asking for help and offering assistance. a need in the moment, and they are frequently thanked for All of the language that Aardvark uses is intended both to their help by the asker. be a communication mechanism between Aardvark and the In order to play the role of intermediary in an ongoing user and an example of how to interact with Aardvark. conversation, Aardvark must have some basic conversational Overall, a large body of research [6, 1, 5, 16] shows that intelligence in order to understand where to direct messages when you provide a 1-1 communication channel, use real from a user: is a given message a new question, a continua- identities rather than pseudonyms, facilitate interactions be- tion of a previous question, an answer to an earlier question, tween existing real-world relationships, and consistently pro- or a command to Aardvark? The details of how the Con- vide examples of how to behave, users in an online commu- versation Manager manages these complications and disam- nity will behave in a manner that is far more authentic and biguates user messages are not essential to the social search helpful than pseudonymous multicasting environments with model, so they are not elaborated here; but the basic ap- no moderators. The design of the Aardvark’s UI has been proach is to use a state machine to model the discourse carefully crafted around these principles. context. In all of the interfaces, wrappers around the messages from 4. EXAMPLES another user include information about the user that can In this section we take a qualitative look at user behavior facilitate trust: the user’s Real Name nametag, with their on Aardvark. Figure 5 shows a random sample of questions name, age, gender, and location; the social connection be- asked on Aardvark in this period. Figure 6 takes a closer tween you and the user (e.g., “Your friend on Facebook”, look at three questions sent to Aardvark during this period, “A friend of your friend Marshall Smith”, “You are both in all three of which were categorized by the Question Analyzer the Stanford group”, etc.); a selection of topics the user has under the primary topic “restaurants in San Francisco”.5 expertise in; and summary statistics of the user’s activity In Example 1, Aardvark opened 3 channels with candidate on Aardvark (e.g., number of questions recently asked or answerers, which yielded 1 answer. An interesting (and not answered). uncommon) aspect of this example is that the asker and the Finally, it is important throughout all of the above in- answerer in fact were already acquaintances, though only teractions that Aardvark maintains a tone of voice which listed as “friends-of-friends” in their online social graphs; is friendly, polite, and appreciative. A social search engine and they had a quick back-and-forth chat through Aardvark. depends upon the goodwill and interest of its users, so it is In Example 2, Aardvark opened 4 channels with candidate important to demonstrate the kind of (linguistic) behavior answerers, which yielded two answers. For this question, the that can encourage these sentiments, in order to set a good “referral” feature was useful: one of the candidate answerers example for users to adopt. Indeed, in user interviews, users was unable to answer, but referred the question along to a often express their desire to have examples of how to speak or behave socially when using Aardvark; since it is a novel 5 Names and affiliations have been changed to protect pri- paradigm, users do not immediately realize that they can vacy.
  8. 8. websites & internet apps business research music, movies, TV, books sports & recreation home & cooking finance & investing technology & programming miscellaneous Aardvark local services Users travel product reviews & help restaurants & bars Figure 8: Categories of questions sent to Aardvark 4 x 10 2.5 Questions Answered 2 Figure 7: Aardvark user growth 1.5 1 friend to answer. The asker wanted more choices after the 0.5 first answer, and resubmitted the question to get another recommendation, which came from a user whose profile top- 0 0−3 min 3−6 min 6−12 min 12−30 min30min−1hr 1−4 hr 4+ hr ics related to business and finance (in addition to dining). In Example 3, Aardvark opened 10 channels with candi- date answerers, yielding 3 answers. The first answer came Figure 9: Distribution of questions and answering from someone with only a distant social connection to the times. asker; the second answer came from a coworker; and the third answer came from a friend-of-friend-of-friend. The October 2009. third answer, which is the most detailed, came from a user who has topics in his profile related to both “restaurants” Mobile users are particularly active Mobile users had and “dating”. an average of 3.6322 sessions per month, which is sur- One of the most interesting features of Aardvark is that prising on two levels. First, mobile users of Aardvark it allows askers to get answers that are hypercustomized to are more active than desktop users. (As a point of their information need. Very different restaurant recommen- comparison, on Google, desktop users are almost 3 dations are appropriate for a date with a spunky and spon- times as active as mobile users [10].) Second, mo- taneous young woman, a post-wedding small formal family bile users of Aardvark are almost as active in absolute gathering, and a Monday evening business meeting — and terms as mobile users of Google (who have on average human answerers are able to recognize these constraints. It 5.68 mobile sessions per month [10]). This is quite sur- is also interesting to note that in most of these examples (as prising for a service that has only been available for 6 in the majority of Aardvark questions), the asker took the months. time to thank the answerer for helping out. We believe this is for two reasons. First, browsing through traditional web search results on a phone is 5. ANALYSIS unwieldy. On a phone, it’s more useful to get a sin- The following statistics give a picture of the current usage gle short answer that’s crafted exactly to your query. and performance of Aardvark. Second, people are used to using natural language with Aardvark was first made available semi-publicly in a beta phones, and so Aardvark’s query model feels natural in release in March of 2009. From March 1, 2009 to October that context. These considerations (and early exper- 20, 2009, the number of users grew to 90,361, having asked iments) also suggest that Aardvark mobile users will a total of 225,047 questions and given 386,702 answers. All be similarly active with voice-based search. of the statistics below are taken from the last month of this Questions are highly contextualized As compared to web period (9/20/2009-10/20/2009). search, where the average query length is between 2.2 Aardvark is actively used As of October, 2009, 90,361 – 2.9 words [10, 14], with Aardvark, the average query users have created accounts on Aardvark, growing or- length is 18.6 words (median=13). While some of this ganically from 2,272 users since March 2009. In this increased length is due to the increased usage of func- period, 50,526 users (55.9% of the user base) generated tion words, 45.3% of these words are content words content on Aardvark (i.e., asked or answered a ques- that give context to the query. In other words, as tion), while 66,658 users (73.8% of the user base) pas- compared to traditional web search, Aardvark ques- sively engaged (i.e., either referred or tagged other peo- tions have 3–4 times as much context. ples questions). The average query volume was 3,167.2 The addition of context results in a greater diversity of questions per day in this period, and the median active queries. While in Web search, between 57 and 63% of user issued 3.1 queries per month. Figure 7 shows the queries are unique [14, 15], in Aardvark 98.1% of ques- number of users per month from early testing through tions are unique (and 98.2% of answers are unique).
  9. 9. 4 100 x 10 3 80 2.5 60 % 2 40 Questions 20 1.5 0 1 >2 >4 >8 >16 >32 Number of Topics 0.5 Figure 11: Distribution of percentage of users and 0 number of topics 0 1 2 3 4 5+ Answers Received in-line feedback which askers provided on the answers Figure 10: Distribution of questions and number of they received rated the answers as ‘good’, 14.1% rated answers received. the answers as ‘OK’, and 15.5% rated the answers as ‘bad’. Questions often have a subjective element A manual There are a broad range of answerers 78,343 users (86.7% tally of 1000 random questions between March and Oc- of users) have been contacted by Aardvark with a re- tober of 2009 shows that 64.7% of queries have a sub- quest to answer a question, and of those, 70% have jective element to them (for example, “Do you know of asked to look at the question, and 38.0% have been any great delis in Baltimore, MD?” or “What are the able to answer. Additionally, 15,301 users (16.9% of things/crafts/toys your children have made that made all users) have contacted Aardvark of their own initia- them really proud of themselves?”). In particular, tive to try answering a question (see Section 3.6 for advice or recommendations queries regarding travel, an explanation of these two modes of answering ques- restaurants, and products are very popular. A large tions). Altogether, 45,160 users (50.0% of the total number of queries are locally oriented. About 10% user base) have answered a question; this is 75% of all of questions related to local services, and 13% dealt users who interacted with Aardvark at all in the pe- with restaurants and bars. Figure 8 shows the top riod (66,658 users). As a comparison, only 27% of Ya- categories of questions sent to Aardvark. The distri- hoo! Answers users have ever answered a question [7]. bution is not dissimilar to that found with traditional While a smaller portion of the Aardvark user base is web search engines [2], but with a much smaller pro- much more active in answering questions – approxi- portion of reference, factual, and navigational queries, mately 20% of the user base is responsible for 85% of and a much greater proportion of experience-oriented, the total number of answers delivered to date – the dis- recommendation, local, and advice queries. tribution of answers across the user base is far broader than on a typical user-generated-content site [7]. Questions get answered quickly 87.7% of questions sub- mitted to Aardvark received at least 1 answer, and Social Proximity Matters Of questions that were routed 57.2% received their first answer in less than 10 min- to somebody in the asker’s social network (most com- utes. On average, a question received 2.08 answers monly a friend of a friend), 76% of the inline feedback (Figure 10), 6 and the median answering time was 6 rated the answer as ‘good’, whereas for those answers minutes and 37 seconds (Figure 9). By contrast, on that came from outside the asker’s social network, 68% public question and answer forums such as Yahoo! An- of them were rated as ‘good’. swers [7] most questions are not answered within the People are indexable 97.7% of the user base has at least first 10 minutes, and for questions asked on Facebook, 3 topics in their profiles, and the median user has 9 only 15.7% of questions are answered within 15 min- topics in her profile. In sum, Aardvark users added utes [12]. (Of course, corpus-based search engines such 1,199,323 topics to their profiles; not counting over- as Google return results in milliseconds, but many of lapping topics, this yields a total of 174,605 distinct the types of questions that are asked from Aardvark topics which the current Aardvark user base has ex- require extensive browsing and query refinement when pertise in. The currently most popular topics in user asked on corpus-based search engines.) profiles in Aardvark are “music”, “movies”, “technol- Answers are high quality Aardvark answers are both com- ogy”, and “cooking”, but in general most topics are prehensive and concise. The median answer length as specific as “logo design” and “San Francisco pickup was 22.2 words; 22.9% of answers were over 50 words soccer”. (i.e., the length of a whole paragraph); and 9.1% of answers included hypertext links in them. 70.4% of 6. EVALUATION 6 To evaluate social search compared to web search, we ran A question may receive more than one answer when the Routing Policy allows Aardvark to contact more than one a side-by-side experiment with Google on a random sample candidate answerer in parallel for a given question, or when of Aardvark queries. We inserted a “Tip” into a random the asker resubmits their question to request a second opin- sample of active questions on Aardvark that read: ”Do you ion. want to help Aardvark run an experiment?” with a link to
  10. 10. an instruction page that asked the user to reformulate their 8. REFERENCES question as a keyword query and search on Google. We [1] H. Bechar-Israeli. From <Bonehead> to asked the users to time how long it took to find a satisfac- <cLoNehEAd>: Nicknames, Play, and Identity on tory answer on both Aardvark and Google, and to rate the Internet Relay Chat. Journal of Computer-Mediated answers from both on a 1-5 scale. If it took longer than 10 Communication, 1995. minutes to find a satisfactory answer, we instructed the user [2] S. M. Bietzel, E. C. Jensen, A. Chowdhury, to give up. Of the 200 responders in the experiment set, we D. Grossman, and O. Frieder. Hourly analysis of a found that 71.5% of the queries were answered successfully very large topically categorized web query log. In on Aardvark, with a mean rating of 3.93 (σ = 1.23), while Proceedings of the 27th SIGIR, 2004. 70.5% of the queries were answered successfully on Google, [3] S. Brin and L. Page. The anatomy of a large-scale with a mean rating of 3.07 (σ = 1.46). The median time-to- hypertextual Web search engine. In Computer satisfactory-response for Aardvark was 5 minutes (of passive Networks and ISDN Systems, 1998. waiting), while the median time-to-satisfactory-response for [4] T. Condie, S. D. Kamvar, and H. Garcia-Molina. Google was 2 minutes (of active searching). Adaptive peer-to-peer topologies. In Proceedings of the Of course, since this evaluation involves reviewing ques- Fourth International Conference on Peer-to-Peer tions which users actually sent to Aardvark, we should ex- Computing, 2004. pect that Aardvark would perform well — after all, users chose these particular questions to send to Aardvark because [5] A. R. Dennis and S. T. Kinney. Testing media richness of their belief that it would be helpful in these cases. And theory in the new media: The effects of cues, in many cases, the users in the experiment noted that they feedback, and task equivocality. Information Systems sent their question to Aardvark specifically because a previ- Research, 1998. ous Google search on the topic was difficult to formulate or [6] J. Donath. Identity and deception in the virtual did not give a satisfactory result.7 community. Communities in Cyberspace, 1998. Thus we cannot conclude from this evaluation that social [7] Z. Gyongyi, G. Koutrika, J. Pedersen, and search will be equally successful for all kinds of questions. H. Garcia-Molina. Questioning Yahoo! Answers. In Further, we would assume that if the experiment were re- Proceedings of the First WWW Workshop on Question versed, and we used as our test set a random sample from Answering on the Web, 2008. Google’s query stream, the results of the experiment would [8] T. Hofmann. Probabilistic latent semantic indexing. be quite different. Indeed, for questions such as “What is In Proceedings of the 22nd SIGIR, 1999. the train schedule from Middletown, NJ to New York Penn [9] B. J. Jansen, A. Spink, and T. Sarcevic. Real life, real Station?”, traditional web search is a preferable option. users, and real needs: a study and analysis of user However, the questions asked of Aardvark do represent queries on the web. Information Processing and a large and important class of information need: they are Management, 2000. typical of the kind of subjective questions for which it is [10] M. Kamvar, M. Kellar, R. Patel, and Y. Xu. difficult for traditional web search engines to provide satis- Computers and iPhones and mobile phones, oh my!: a fying results. The questions include background details and logs-based comparison of search users on different elements of context that specify exactly what the asker is devices. In Proceedings of the 18th International looking for, and it is not obvious how to translate these infor- Conference on World Wide Web, 2009. mation needs into keyword searches. Further, there are not [11] D. Klein, K. Toutanova, H. T. Ilhan, S. D. Kamvar, always existing web pages that contain exactly the content and C. D. Manning. Combining heterogeneous that is being sought; and in any event, it is difficult for the classifiers for word-sense disambiguation. In asker to assess whether any content that is returned is trust- Proceedings of the SIGLEX/SENSEVAL Workshop on worthy or right for them. In these cases, askers are looking Word Sense Disambiguation, 2002. for personal opinions, suggestions, recommendations, or ad- [12] M. R. Morris, J. Teevan, and K. Panovich. What do vice, from someone they feel a connection with and trust. people ask their social networks, and why? a survey The desire to have a fellow human being understand what study of status message Q&A behavior. In Proceedings you are looking for and respond in a personalized manner in of CHI, 2010. real time is one of the main reasons why social search is an [13] L. Page, S. Brin, R. Motwani, and T. Winograd. The appealing mechanism for information retrieval. PageRank citation ranking: Bringing order to the Web. Stanford Digital Libraries Working Paper, 1998. 7. ACKNOWLEDGMENTS [14] C. Silverstein, M. Henziger, H. Marais, and M. Moricz. Analysis of a very large Web search engine We’d like to thank Max Ventilla and the entire Aardvark query log. In SIGIR Forum, 1999. team for their contributions to the work reported here. We’d especially like to thank Rob Spiro for his assistance on this [15] A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic. paper. And we are grateful to the authors of the original From e-sex to e-commerce: Web search changes. IEEE “Anatomy of a Large-Scale Hypertextual Web Search En- Computer, 2002. gine” [3] from which we derived the title and the inspiration [16] L. Sproull and S. Kiesler. Computers, networks, and for this paper. work. In Global Networks:Computers and International Communication. MIT Press, 1993. 7 [17] M. Wesch. An anthropological introduction to For example: “Which golf courses in the San Francisco Bay Area have the best drainage / are the most playable YouTube. Library of Congress, June, 2008. during the winter (especially with all of the rain we’ve been getting)?”