SlideShare a Scribd company logo
1 of 8
Download to read offline
Query Recommendation Using Large-scale Web
     Access Logs and Web Page Archive

                Lin Li 1 , Shingo Otsuka 2 , and Masaru Kitsuregawa      1

          1
              Dept. of Info. and Comm. Engineering, The University of Tokyo
                   4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan
                            {lilin, kitsure}@tkl.iis.u-tokyo.ac.jp
                         2
                           National Institute for Materials Science
                     1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
                                  otsuka.shingo@nims.go.jp


        Abstract. Query recommendation suggests related queries for search
        engine users when they are not satisfied with the results of an initial in-
        put query, thus assisting users in improving search quality. Conventional
        approaches to query recommendation have been focused on expanding
        a query by terms extracted from various information sources such as a
        thesaurus like WordNet1 , the top ranked documents and so on. In this
        paper, we argue that past queries stored in query logs can be a source
        of additional evidence to help future users. We present a query recom-
        mendation system based on large-scale Web access logs and Web page
        archive, and evaluate three query recommendation strategies based on
        different feature spaces (i.e., noun, URL, and Web community). The ex-
        perimental results show that query logs are an effective source for query
        recommendation, and the Web community-based and noun-based strate-
        gies can extract more related search queries than the URL-based one.


1     Introduction
Keyword based queries supported by Web search engines help users conveniently
find Web pages that match their information needs. A main problem, however,
occurs for users: properly specifying their information needs through keyword-
based queries. One reason is that queries submitted by users are usually very
short [8]. The very small overlap of the query terms and the document terms
in the desired documents will retrieve Web pages which are not what users are
searching for. The other reason is that users might fail to choose terms at the
appropriate level of representation for their information needs. Ambiguity of
short queries and the limitation of user’s representation give rise to the problem
of phrasing satisfactory queries.
    The utilization of query recommendation has been investigated to help users
formulate satisfactory queries [1, 3, 4, 8–10]. For example, new terms can be ex-
panded to the existing list of terms in a query and users can also change some
or all the terms in a query. We summarize the main process of query recommen-
dation in the following three steps:
1
    http://wordnet.princeton.edu/
1. choosing information sources where most relevant terms as recommendation
    candidates can be found, given a current query;
 2. designing a measure to rank candidate terms in terms of relatedness;
 3. utilizing the top ranked relevant terms to reformulate the current query.
The selection of information sources in the first step plays an important role for
an effective query recommendation strategy. The general idea of existing query
recommendation strategies [3, 4, 9, 10] has focused on finding relevant terms
from documents to existing terms in a query based on the hypothesis that a
frequent term from the documents will tend to co-occur with all query terms.
On the Web, this hypothesis is reasonable, but not always true since there exists
a large gap between the Web document and query spaces, as indicated by [4]
which have utilized query logs to bridge the query and Web document spaces. In
this paper, different from the above researches, we think that past queries stored
in the query logs may be a source of additional evidence to help future users.
Some users who are not very familiar with a certain domain, can gradually refine
their queries from related queries that have been searched by previous users, and
hence get the Web pages they want.
    In this paper, we present and evaluate a query recommendation system using
past queries that provide a pool of relevant terms. To calculate the relatedness
between two queries, we augment short Web queries by three feature spaces, i.e.,
noun space, URL space, and community (community means Web community in
this paper) space respectively. The three feature spaces are based on Web access
logs (the collected 10GB URL histories of Japanese users selected without static
deviation ) and a 4.5 million Japanese Web page archive. We propose a query
recommendation strategy using a community feature space which is different
from the noun and URL feature spaces commonly used in the literature [1, 3, 4,
8, 9]. The evaluation of query recommendation is labor intensive and it is not
easy to construct an object test data set for it at current stage. An evaluation is
carefully designed to make it clear that to which degree different feature based
strategies add value to the query recommendation quality. We study this problem
and provide some preliminary conclusions. The experimental results show that
query logs are an effective source for query recommendation, and community-
based and noun-based methods can extract more related search keywords than
the URL-based one.
    The rest of this paper is organized as follows. Firstly, we introduce three
query recommendation strategies in Section 2. Then, we describe the details of
experiment methodology and discuss the experimental results in Section 3 and
Section 4 respectively. Lastly, we conclude our work in Section 5.


2   Query Recommendation Strategies
The goal our recommendation system is to find the related past queries to a
current query input by a Web user, which means we need to measure the relat-
edness between queries and then recommend the top ranked queries. Previous
queries having common terms with the input query are naturally recommended.
However, it is possible that queries can be phrased differently with different
terms but for the same information needs while they can be identical but for the
different information needs. To more accurately measure relatedness between
two queries, most of existing strategies augment a query by terms from Web
pages or search result URLs [1, 5, 8]. We think that the information related to
the accessed Web pages by Web users are useful sources to augment original
queries because the preferences of a user are reflected in form of her accesses.
One important assumption behind this idea is that the accessed Web pages are
relevant to the query. At the first glance, although the access information is not
as accurate as explicit relevance judgment in the traditional relevance feedback,
the user’s choice does suggest a certain level of relevance. It is therefore reason-
able to regard the accessed Web pages as relevant examples from a statistical
viewpoint.


2.1     Three Feature Spaces for Query Recommendation

Given these accessed Web pages, we define three feature spaces as noun space,
URL space, and community space to augment their corresponding Web queries
and then estimate the relatedness between two augmented queries.
Noun Space As we know, nouns in a document can more accurately repre-
sent the topic described by the document than others. Therefore, we enrich a
query with the nouns extracted from the contents of its accessed Web page sets,
which intends to find the topics hidden in the short query. Our noun space is
created using ChaSen, a Japanese morphological analyzer 2 . Since we have al-
ready crawled a 4.5 million Japanese Web page archive, we can complete the
morphological analysis of all the Web pages in advance.
URL Space The query recommendation strategy using the noun feature space
is not applicable, at least in principle, in settings including: non-text pages like
multimedia (image) files, documents in non-HTML file formats such as PDF and
DOC documents, pages with limited access like sites that require registration and
so on. In these cases, URLs of Web pages are an alternate source. Furthermore,
because the URL space is insensitive to content, for online applications, this
method is easier and faster to get recommendation lists of related past queries
than the noun feature space. Our URL space consists of the hostnames of ac-
cessed URLs in our Web logs. The reason that we use hostnames of URLs will
be explained in Sectioin 3.2.
Community Space The noun and URL spaces are straightforward and com-
mon sources to enhance a query. In this paper, we utilize the Web community
information as another useful feature space since each URL can be clustered into
its respective community. The technical detail of creating community is in our
previous work [7] which created a web community chart based on the complete
bipartite graphs, and extracted communities automatically from a large amount
of web pages. We labeled each Web page by a community ID. As thus, these
community IDs constitute our community space.
2
    http://chasen-legacy.sourceforge.jp/
Query1     Query 2    Relatedness
                       Query
                                   W eb server            Query         Query                                  noun com URL
                                    (Apache)                        Recommendation         bank      deposit 0.80 0.95 0.85
                       Result                             Result         Data
            Web User         W eb server sorts the value of                                bank     Asahi bank 0.70 0.90 0.30
                         relatedness in ascending order
                               at each feature space.
                                                                                           bank      Pay o     0.74 0.30 0.00
                                                                                Query Recommendation Data
                                                                                 was created in advance.



                                                    Web L ogs      Web C ommunity       Web Archive
                                                      Data              Data               Data




                                    Fig. 1. The architecture of our system


2.2    Relatedness Definition
In this section we discuss how to calculate relatedness between two queries en-
riched by the three feature spaces. The relatedness is defined as

                                              ei ∈qx &ei ∈qy fqx (ei ) +                    ei ∈qx &ei ∈qy fqy (ei )
                   Rqx ,qy =                                                                                            ,
                                                                                    2
where qx and qy are queries, and Rqx , qy is the estimated relatedness score be-
tween qx and qy . Let Q = {q1 , q2 , . . . , qx , . . . , qn } be a universal set of all search
queries where n is the number of total search queries. We augment the query
qx by adding one of the three feature spaces(i.e., noun, URL, and community
ID), denoted as: qx = {e1 , e2 , . . . , ex , . . . , em } where ex is an element of the used
feature space and m is the total number of elements. The frequencies of elements
in a query are denoted as fqx = {fe1 , fe2 , . . . , fex , . . . , fem } for a single feature
space. For URL space, we can get the frequencies of URLs visited by different
users using access information in our Web logs.
    In addition, we create an excluded set which stores the highly frequent ele-
ments of URLs, communities and nouns contained in accessed Web page sets.
For example, the highly frequent elements of URLs are Yahoo!, MSN, Google
and so on, and the highly frequent elements of nouns are I, today, news and so
on. We exclude a highly frequent element eh from the frequency space of fqx if
the number of the test queries which feature spaces include eh are more than
half of the number of all the test queries used in our evaluation.


3     Experiment Methodology
3.1    Our System Overview
The goal of our query recommendation system is to find the related queries
from past queries given an input query. Figure 1 shows a sketch of the system
architecture consisting of two components. One is “Web Server(Apache)” where a
user interactively communicates with our system. The user inputs a query to the
Web server, and then the Web server returns the list of related past queries to her.
(1) (2) (3)       (4)


                                                                                                                            Google Suggestion
                                                                                                                          bank   interest
                                                                                                                          bank   comparison
                                                                                                                          bank   rank ing
                                                                                                                          bank   business hours
               (Noun space)                                                                                               bank   grading
                                                                                                                          bank   charge
                                                                                                                          bank   transfer charge
                                                                                                                          bank   code
       (C ommunity space)                                                                                                 bank   transfer
                                                                                                                          bank   interest comparison


               (UR L space)

(Noun space)
        quot; major consumer product mak erquot;   quot; M izuho bank quot;             quot; exchange ratequot;                      quot; Sony bank quot;
        quot; uni cation of the bank squot;        quot; Sonyquot;                      quot; foreign currency depositsquot;          (name of n  ancial commodity)
        quot; K oizumi cabinet inaugurationquot;   quot; ratequot; quot; megabank quot;         (name of nancial commodity )          (name of nancial commodity)
        quot; depositquot;                         quot; H itachiquot;                  quot; M izuhoquot; (bank name)                quot; J apanese-E nglish dictionaryquot;
        quot; restructurequot;                     quot; pay o quot;                    quot; Y en ratequot;                          quot; my car loanquot;
(Community space)
        quot; depositquot;                         quot; combative sportquot;           quot; M izuhoquot; (bank name)                quot; exchange ratequot;
        quot; A sahi bank quot;                    quot; national athletic meetquot;    quot; my car loanquot;                        quot; bank ruptcy informationquot;
        quot; welfare xed depositquot;             quot; advanced paymentquot;          quot; Saltlak equot; (city name)              quot; Y amanashi C huo bank quot;
        quot; totoquot; (football lot)             (a name of public lottery)   quot; break -even pointquot;                  quot; Suito shink in bank quot;
        quot; K umik o E ndoquot;                  quot; announcerquot;                 quot; beautyquot;                             quot; Dai-I chi K angyo bank quot;
(UR L space)
        quot; K umik o E ndoquot;                  quot; national athletic meetquot;    quot; Saltlak equot; (city name)              quot; tra ic jam informationquot;
        quot; T azawa lak equot;                   (company name)               quot; marathonquot;                           quot; cherry treequot;
        quot; depositquot;                         quot; announcerquot;                 (company name)                        quot; M izuhoquot; (bank name)
        quot; combative sportquot;                 quot; traveler's check quot;         quot; cellular phonequot; quot; bulletin boardquot;   (company name)
        quot; K awabequot;                         quot; cleanness freak quot;          quot; winequot;                               quot; Syunsuk e Nak amuraquot;



                                   Fig. 2. The user interface of our system




The other is “Query Recommendation Data” storing the recommendation results
of the three strategies discussed in Section 2. Our Web logs, Web community, and
Web page archive data ensure the richness of information sources to augment
Web queries. The interface of our query recommendation system is illustrated
in Figure 2. A user can input a search query in Figure 2(1) while the related
queries recommended by the component “Query Recommendation Data” are
shown below and divided by using different feature spaces. Then, the user can
choose one recommended query to add or replace the initial query and submit
the reformulated query to a search engine selected from a dropdown list as
shown in Figure 2(2). Finally, the search results retrieved by the selected search
engine are listed in the right part. Furthermore, in Figure 2(3), there are two
slide bars which can adjust the lower and upper bounds of relatedness scores.
For each feature space, the maximal number of recommended queries is 20. If
the user wants more hints, she can click a button shown in Figure 2(4) to get
more recommended queries ordered by their relatedness scores with the initial
query. The more the recommended query is related to the initial query, the
query is displayed as a deeper red. Figure 2 presents recommendation of the
query “Bank” as an example. Since in this study we utilize Japanese Web data,
the corresponding English translation is in the bottom of this figure. If some
U serID AccessTime          R efSec    UR L
                          1 2002/9/30 00:00: 00        4   http://www.tkl.iis.u-tokyo.ac.jp/welcome_j.html
                          2 2002/9/30 00:00: 00        6   http://www.jma.go.jp/JM A_H P/jma/index.html
                          3 2002/9/30 00:00: 00        8   http://www.kantei.go.jp/
                          4 2002/9/30 00:00: 00      15    http://www.google.co.jp/
                          1 2002/9/30 00:00: 04        6   http://www.tkl.iis.u-tokyo.ac.jp/K ilab/W elcome.html
                          5 2002/9/30 00:00: 04        3   http://www.yahoo.co.jp/
                          6 2002/9/30 00:00: 05      54    http://weather.crc.co.jp/
                          2 2002/9/30 00:00: 06      11    http://www.data.kishou.go.jp/maiji/
                          3 2002/9/30 00:00: 08      34    http://www.kantei.go.jp/new/kousikiyotei.html
                          5 2002/9/30 00:00: 07      10    http://search.yahoo.co.jp/bin/search? p=% C 5% B 7% B 5% A4
                          5 2002/9/30 00:00: 10     300    http://www.tkl.iis.u-tokyo.ac.jp/K ilab/Members/members-j.html




                 Fig. 3. A part of our panel logs (Web access logs)


queries are only available in Japanese, we give a brief English explanation. For
example, the query “Mizuho” is a famous Japanese bank.


3.2   Data Sets

Web Access Logs Our Web access logs, also called “panel logs” are provided
by Video Research Interactive Inc. which is one of Internet rating companies.
The collecting method and statistics of this panel logs are described in [6]. Here
we give a brief description. The panel logs consist of user ID, access time of
Web page, reference seconds of Web page, URL of accessed Web page and so on.
The data size is 10GB and the number of users is about 10 thousand. Figure 3
shows the details of a part of our panel logs. In this study, we need to extract
past queries and related information from the whole panel logs. We notice that
the URL from a search engine (e.g., Yahoo!) records the query submitted by a
user, as shown in Figure 3(a). We extract the query from the URL, and then the
access logs followed this URL in a session are corresponding Web pages browsed
by the user. The maximum interval to determine the session boundary is 30
minutes, a well-known threshold [2] such that two continuous accesses within 30
minutes interval are regarded as in a same session.


Japanese Web Page Archive The large-scale snapshot of a Japanese Web
page archive we used was built in February 2002. We crawled 4.5 million Web
pages during the panel logs collection period and automatically created 17 hun-
dred thousand communities from one million selected pages [7]. Since the time of
crawling the Web pages for the Web communities is during the time of panel logs
collection, there are some Web pages which are not covered by the crawling due
to the change and deletion of pages accessed by the panels. We did a preliminary
analysis that the full path URL overlap between the Web access logs and Web
page archive is only 18.8%. Therefore, we chopped URLs to their hostnames and
then the overlap increases to 65%.


3.3   Evaluation Method

We invited nine volunteers(users) to evaluate the query recommendation strate-
gies using our system. They are our lab members who usually use search engines
Table 1. Search queries for evaluation

      Test Query Accessed Web Pages     Group Test Query Accessed Web Pages     Group
      lottery                       891   A   bank                          113   C
      ring tone                     446   B   fishing                         64   A
      movie                         226   C   scholarship                    56   B
      hot spring                    211   A   university                     50   C
      soccer                        202   B


           Table 2. Evaluation results of our query recommendation system

              Relevance of queries         Noun space Community space URL space
              irrelevant                       0.037            0.244     0.341
              lowly relevant                    0.043           0.089     0.107
              relevant                          0.135           0.131     0.106
              highly relevant                  0.707            0.480     0.339
              relevant and highly relevant     0.843            0.611     0.444
              un-judged                         0.078           0.056     0.107




to meet their information needs. We also compare these strategies with the query
recommendation service supplied by Google search engine. Nine test queries used
in our evaluation are listed in Table 1. The total number of queries evaluated by
users are 540 because they evaluate the top 20 of recommended queries accord-
ing to their relatedness values on each of the three feature spaces 3 . To alleviate
the workload on an individual user, we divided the nine users to three group
(i.e., A, B, and C) as shown in Table 1. We ask each group to give their rele-
vance judgments on three queries. The relevance judgment has five levels, i.e.,
irrelevant, lowly relevant, relevant, highly relevant, and un-judged.


4     Evaluation Results and Discussions

4.1     Comparisons of Query Recommendation Strategies

The evaluation results of the recommended queries related with all search queries
are shown in Table 2 where each value denotes the percentage of a relevance
judgment level on each feature space. For example, The value with the “highly
relevant” judgment using noun space is 0.707 which means the percentage of
the “highly relevant” judgment chosen by the users in all recommended queries4
using the noun based strategy.
    In Table 2, when the noun space based strategy is applied, there are more
recommended queries evaluated “highly relevant” and fewer queries evaluated
“irrelevant” than the other two feature spaces. Furthermore, if we combine the
“highly relevant” judgments and the “relevant” judgments, we still gain the
best results in the noun space while the percentage of the recommended queries
judged as “irrelevant” using the URL space (i.e., 0.341) is higher than those
3
    9 search queries * 20 recommended queries * 3 feature spaces = 540 evaluated queries
4
    9 search queries * 20 recommended queries = 180 evaluated queries
using other two spaces. In the community space, there are many recommended
queries evaluated “highly relevant” and “relevant”. Although the community
based strategy produces less related queries than the noun based strategy, it is
more than the URL based strategy. By using the URL space, the number of
recommended queries judged as “irrelevant” is more than that of queries judged
as “highly relevant” (e.g., “irrelevant” vs. “highly relevant” = 0.341 vs. 0.339
in Table 2). In general, the noun and community spaces can supply us with
satisfactory recommendation while the URL space based strategy cannot stably
produce satisfactory queries related to a query.

4.2   Case Study with “Google Suggestion”
We compare the result of our case study in Figure 2 with the query recommen-
dation service provided by Google Suggestion. In Figure 2 our system presents
some good recommended queries related to bank in the noun and community
spaces such as “deposit”, “my car loan”, “toto” and so on that are not given by
Google Suggestion.

5     Conclusions and Future Work
In this paper, we design a query recommendation system based on three feature
spaces, i.e., noun space, URL space, and community space, by using large-scale
Web access logs and Japanese Web page archive. The experimental results show
that the community-based and noun-based strategies can extract more related
search queries than the URL-based strategy. We are designing an optimization
algorithm for incremental updates of our recommendation data.

References
 1. D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query
    log. In KDD, pages 407–416, 2000.
 2. L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world-wide
    web. Computer Networks and ISDN Systems, (27(6)), 1995.
 3. P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized query expansion for the
    web. In SIGIR, pages 7–14, 2007.
 4. H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Query expansion by mining user
    logs. IEEE Trans. Knowl. Data Eng., 15(4):829–839, 2003.
 5. N. S. Glance. Community search assistant. In IUI, pages 91–96, 2001.
 6. S. Otsuka, M. Toyoda, J. Hirai, and M. Kitsuregawa. Extracting user behavior by
    web communities technology on global web logs. In DEXA, pages 957–968, 2004.
 7. M. Toyoda and M. Kitsuregawa. Creating a web community chart for navigating
    related communities. In HT, pages 103–112, 2001.
 8. J.-R. Wen, J.-Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans.
    Inf. Syst., 20(1):59–81, 2002.
 9. J. Xu and W. B. Croft. Query expansion using local and global document analysis.
    In SIGIR, pages 4–11, 1996.
10. Y. Zhu and L. Gruenwald. Query expansion using web access log files. In DEXA,
    pages 686–695, 2005.

More Related Content

What's hot

IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET Journal
 
The Research on Related Technologies of Web Crawler
The Research on Related Technologies of Web CrawlerThe Research on Related Technologies of Web Crawler
The Research on Related Technologies of Web CrawlerIRJESJOURNAL
 
keyword query routing
keyword query routingkeyword query routing
keyword query routingswathi78
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routingchennaijp
 
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAFLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAIJwest
 
Focused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainsFocused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainseSAT Publishing House
 
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURES
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURESWEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURES
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURESShakas Technologies
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 
Linking Stanford Typed Dependencies to Support Text Analytics
Linking Stanford Typed Dependencies to Support Text AnalyticsLinking Stanford Typed Dependencies to Support Text Analytics
Linking Stanford Typed Dependencies to Support Text Analyticsfzablith
 
Classification of search_engine
Classification of search_engineClassification of search_engine
Classification of search_engineBookStoreLib
 
web image re-ranking using query-specific semantic signatures
web image re-ranking using query-specific semantic signaturesweb image re-ranking using query-specific semantic signatures
web image re-ranking using query-specific semantic signaturesswathi78
 
Data mining in web search engine optimization
Data mining in web search engine optimizationData mining in web search engine optimization
Data mining in web search engine optimizationBookStoreLib
 
Farthest first clustering in links reorganization
Farthest first clustering in links reorganizationFarthest first clustering in links reorganization
Farthest first clustering in links reorganizationIJwest
 
Open source search engine
Open source search engineOpen source search engine
Open source search enginePrimya Tamil
 
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...IEEEFINALYEARSTUDENTPROJECTS
 

What's hot (18)

IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
 
The Research on Related Technologies of Web Crawler
The Research on Related Technologies of Web CrawlerThe Research on Related Technologies of Web Crawler
The Research on Related Technologies of Web Crawler
 
keyword query routing
keyword query routingkeyword query routing
keyword query routing
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routing
 
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATAFLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
 
Focused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainsFocused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domains
 
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURES
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURESWEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURES
WEB IMAGE RE-RANKING USING QUERY-SPECIFIC SEMANTIC SIGNATURES
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
Linking Stanford Typed Dependencies to Support Text Analytics
Linking Stanford Typed Dependencies to Support Text AnalyticsLinking Stanford Typed Dependencies to Support Text Analytics
Linking Stanford Typed Dependencies to Support Text Analytics
 
Classification of search_engine
Classification of search_engineClassification of search_engine
Classification of search_engine
 
web image re-ranking using query-specific semantic signatures
web image re-ranking using query-specific semantic signaturesweb image re-ranking using query-specific semantic signatures
web image re-ranking using query-specific semantic signatures
 
Data mining in web search engine optimization
Data mining in web search engine optimizationData mining in web search engine optimization
Data mining in web search engine optimization
 
Farthest first clustering in links reorganization
Farthest first clustering in links reorganizationFarthest first clustering in links reorganization
Farthest first clustering in links reorganization
 
Open source search engine
Open source search engineOpen source search engine
Open source search engine
 
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...
IEEE 2014 JAVA DATA MINING PROJECTS Web image re ranking using query-specific...
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
At33264269
At33264269At33264269
At33264269
 

Viewers also liked

CometPub20070223
CometPub20070223CometPub20070223
CometPub20070223Hiroshi Ono
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyHiroshi Ono
 
shibuya_pm_tt07_mogilefs_with_catalyst
shibuya_pm_tt07_mogilefs_with_catalystshibuya_pm_tt07_mogilefs_with_catalyst
shibuya_pm_tt07_mogilefs_with_catalystHiroshi Ono
 
Dive into CPython Bytecode
Dive into CPython BytecodeDive into CPython Bytecode
Dive into CPython BytecodeAlex Gaynor
 
Python Yield
Python YieldPython Yield
Python Yieldyangjuven
 

Viewers also liked (7)

CometPub20070223
CometPub20070223CometPub20070223
CometPub20070223
 
Promise
PromisePromise
Promise
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
 
Python Tercera Sesion de Clases
Python Tercera Sesion de ClasesPython Tercera Sesion de Clases
Python Tercera Sesion de Clases
 
shibuya_pm_tt07_mogilefs_with_catalyst
shibuya_pm_tt07_mogilefs_with_catalystshibuya_pm_tt07_mogilefs_with_catalyst
shibuya_pm_tt07_mogilefs_with_catalyst
 
Dive into CPython Bytecode
Dive into CPython BytecodeDive into CPython Bytecode
Dive into CPython Bytecode
 
Python Yield
Python YieldPython Yield
Python Yield
 

Similar to Query Recommendation Using Large-scale Web Access Logs and Web Page Archive

SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINSEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINcscpconf
 
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...IJCSIS Research Publications
 
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequentialeSAT Publishing House
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticseSAT Journals
 
Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain dannyijwest
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
Query- And User-Dependent Approach for Ranking Query  Results in Web DatabasesQuery- And User-Dependent Approach for Ranking Query  Results in Web Databases
Query- And User-Dependent Approach for Ranking Query Results in Web DatabasesIOSR Journals
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
 
RabbitMQ Implementation as Message Broker in Distributed Application with RES...
RabbitMQ Implementation as Message Broker in Distributed Application with RES...RabbitMQ Implementation as Message Broker in Distributed Application with RES...
RabbitMQ Implementation as Message Broker in Distributed Application with RES...IJCSIS Research Publications
 
page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawlingpradiprahul
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...journalBEEI
 
User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...eSAT Publishing House
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search enginePrimya Tamil
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data1crore projects
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...ijcseit
 
Web service discovery methods and techniques a review
Web service discovery methods and techniques a reviewWeb service discovery methods and techniques a review
Web service discovery methods and techniques a reviewijcseit
 

Similar to Query Recommendation Using Large-scale Web Access Logs and Web Page Archive (20)

SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAINSEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN
 
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
 
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequential
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semantics
 
Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain Semantic Information Retrieval Using Ontology in University Domain
Semantic Information Retrieval Using Ontology in University Domain
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
Query- And User-Dependent Approach for Ranking Query  Results in Web DatabasesQuery- And User-Dependent Approach for Ranking Query  Results in Web Databases
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
 
2 ijmtst031002
2 ijmtst0310022 ijmtst031002
2 ijmtst031002
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
 
RabbitMQ Implementation as Message Broker in Distributed Application with RES...
RabbitMQ Implementation as Message Broker in Distributed Application with RES...RabbitMQ Implementation as Message Broker in Distributed Application with RES...
RabbitMQ Implementation as Message Broker in Distributed Application with RES...
 
page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawling
 
PAGE RANKING
PAGE RANKING PAGE RANKING
PAGE RANKING
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
 
User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Web service discovery methods and techniques a review
Web service discovery methods and techniques a reviewWeb service discovery methods and techniques a review
Web service discovery methods and techniques a review
 

More from Hiroshi Ono

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipediaHiroshi Ono
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説Hiroshi Ono
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitectureHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdfHiroshi Ono
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 

More from Hiroshi Ono (20)

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipedia
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitecture
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdf
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdf
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Query Recommendation Using Large-scale Web Access Logs and Web Page Archive

  • 1. Query Recommendation Using Large-scale Web Access Logs and Web Page Archive Lin Li 1 , Shingo Otsuka 2 , and Masaru Kitsuregawa 1 1 Dept. of Info. and Comm. Engineering, The University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan {lilin, kitsure}@tkl.iis.u-tokyo.ac.jp 2 National Institute for Materials Science 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan otsuka.shingo@nims.go.jp Abstract. Query recommendation suggests related queries for search engine users when they are not satisfied with the results of an initial in- put query, thus assisting users in improving search quality. Conventional approaches to query recommendation have been focused on expanding a query by terms extracted from various information sources such as a thesaurus like WordNet1 , the top ranked documents and so on. In this paper, we argue that past queries stored in query logs can be a source of additional evidence to help future users. We present a query recom- mendation system based on large-scale Web access logs and Web page archive, and evaluate three query recommendation strategies based on different feature spaces (i.e., noun, URL, and Web community). The ex- perimental results show that query logs are an effective source for query recommendation, and the Web community-based and noun-based strate- gies can extract more related search queries than the URL-based one. 1 Introduction Keyword based queries supported by Web search engines help users conveniently find Web pages that match their information needs. A main problem, however, occurs for users: properly specifying their information needs through keyword- based queries. One reason is that queries submitted by users are usually very short [8]. The very small overlap of the query terms and the document terms in the desired documents will retrieve Web pages which are not what users are searching for. The other reason is that users might fail to choose terms at the appropriate level of representation for their information needs. Ambiguity of short queries and the limitation of user’s representation give rise to the problem of phrasing satisfactory queries. The utilization of query recommendation has been investigated to help users formulate satisfactory queries [1, 3, 4, 8–10]. For example, new terms can be ex- panded to the existing list of terms in a query and users can also change some or all the terms in a query. We summarize the main process of query recommen- dation in the following three steps: 1 http://wordnet.princeton.edu/
  • 2. 1. choosing information sources where most relevant terms as recommendation candidates can be found, given a current query; 2. designing a measure to rank candidate terms in terms of relatedness; 3. utilizing the top ranked relevant terms to reformulate the current query. The selection of information sources in the first step plays an important role for an effective query recommendation strategy. The general idea of existing query recommendation strategies [3, 4, 9, 10] has focused on finding relevant terms from documents to existing terms in a query based on the hypothesis that a frequent term from the documents will tend to co-occur with all query terms. On the Web, this hypothesis is reasonable, but not always true since there exists a large gap between the Web document and query spaces, as indicated by [4] which have utilized query logs to bridge the query and Web document spaces. In this paper, different from the above researches, we think that past queries stored in the query logs may be a source of additional evidence to help future users. Some users who are not very familiar with a certain domain, can gradually refine their queries from related queries that have been searched by previous users, and hence get the Web pages they want. In this paper, we present and evaluate a query recommendation system using past queries that provide a pool of relevant terms. To calculate the relatedness between two queries, we augment short Web queries by three feature spaces, i.e., noun space, URL space, and community (community means Web community in this paper) space respectively. The three feature spaces are based on Web access logs (the collected 10GB URL histories of Japanese users selected without static deviation ) and a 4.5 million Japanese Web page archive. We propose a query recommendation strategy using a community feature space which is different from the noun and URL feature spaces commonly used in the literature [1, 3, 4, 8, 9]. The evaluation of query recommendation is labor intensive and it is not easy to construct an object test data set for it at current stage. An evaluation is carefully designed to make it clear that to which degree different feature based strategies add value to the query recommendation quality. We study this problem and provide some preliminary conclusions. The experimental results show that query logs are an effective source for query recommendation, and community- based and noun-based methods can extract more related search keywords than the URL-based one. The rest of this paper is organized as follows. Firstly, we introduce three query recommendation strategies in Section 2. Then, we describe the details of experiment methodology and discuss the experimental results in Section 3 and Section 4 respectively. Lastly, we conclude our work in Section 5. 2 Query Recommendation Strategies The goal our recommendation system is to find the related past queries to a current query input by a Web user, which means we need to measure the relat- edness between queries and then recommend the top ranked queries. Previous queries having common terms with the input query are naturally recommended.
  • 3. However, it is possible that queries can be phrased differently with different terms but for the same information needs while they can be identical but for the different information needs. To more accurately measure relatedness between two queries, most of existing strategies augment a query by terms from Web pages or search result URLs [1, 5, 8]. We think that the information related to the accessed Web pages by Web users are useful sources to augment original queries because the preferences of a user are reflected in form of her accesses. One important assumption behind this idea is that the accessed Web pages are relevant to the query. At the first glance, although the access information is not as accurate as explicit relevance judgment in the traditional relevance feedback, the user’s choice does suggest a certain level of relevance. It is therefore reason- able to regard the accessed Web pages as relevant examples from a statistical viewpoint. 2.1 Three Feature Spaces for Query Recommendation Given these accessed Web pages, we define three feature spaces as noun space, URL space, and community space to augment their corresponding Web queries and then estimate the relatedness between two augmented queries. Noun Space As we know, nouns in a document can more accurately repre- sent the topic described by the document than others. Therefore, we enrich a query with the nouns extracted from the contents of its accessed Web page sets, which intends to find the topics hidden in the short query. Our noun space is created using ChaSen, a Japanese morphological analyzer 2 . Since we have al- ready crawled a 4.5 million Japanese Web page archive, we can complete the morphological analysis of all the Web pages in advance. URL Space The query recommendation strategy using the noun feature space is not applicable, at least in principle, in settings including: non-text pages like multimedia (image) files, documents in non-HTML file formats such as PDF and DOC documents, pages with limited access like sites that require registration and so on. In these cases, URLs of Web pages are an alternate source. Furthermore, because the URL space is insensitive to content, for online applications, this method is easier and faster to get recommendation lists of related past queries than the noun feature space. Our URL space consists of the hostnames of ac- cessed URLs in our Web logs. The reason that we use hostnames of URLs will be explained in Sectioin 3.2. Community Space The noun and URL spaces are straightforward and com- mon sources to enhance a query. In this paper, we utilize the Web community information as another useful feature space since each URL can be clustered into its respective community. The technical detail of creating community is in our previous work [7] which created a web community chart based on the complete bipartite graphs, and extracted communities automatically from a large amount of web pages. We labeled each Web page by a community ID. As thus, these community IDs constitute our community space. 2 http://chasen-legacy.sourceforge.jp/
  • 4. Query1 Query 2 Relatedness Query W eb server Query Query noun com URL (Apache) Recommendation bank deposit 0.80 0.95 0.85 Result Result Data Web User W eb server sorts the value of bank Asahi bank 0.70 0.90 0.30 relatedness in ascending order at each feature space. bank Pay o 0.74 0.30 0.00 Query Recommendation Data was created in advance. Web L ogs Web C ommunity Web Archive Data Data Data Fig. 1. The architecture of our system 2.2 Relatedness Definition In this section we discuss how to calculate relatedness between two queries en- riched by the three feature spaces. The relatedness is defined as ei ∈qx &ei ∈qy fqx (ei ) + ei ∈qx &ei ∈qy fqy (ei ) Rqx ,qy = , 2 where qx and qy are queries, and Rqx , qy is the estimated relatedness score be- tween qx and qy . Let Q = {q1 , q2 , . . . , qx , . . . , qn } be a universal set of all search queries where n is the number of total search queries. We augment the query qx by adding one of the three feature spaces(i.e., noun, URL, and community ID), denoted as: qx = {e1 , e2 , . . . , ex , . . . , em } where ex is an element of the used feature space and m is the total number of elements. The frequencies of elements in a query are denoted as fqx = {fe1 , fe2 , . . . , fex , . . . , fem } for a single feature space. For URL space, we can get the frequencies of URLs visited by different users using access information in our Web logs. In addition, we create an excluded set which stores the highly frequent ele- ments of URLs, communities and nouns contained in accessed Web page sets. For example, the highly frequent elements of URLs are Yahoo!, MSN, Google and so on, and the highly frequent elements of nouns are I, today, news and so on. We exclude a highly frequent element eh from the frequency space of fqx if the number of the test queries which feature spaces include eh are more than half of the number of all the test queries used in our evaluation. 3 Experiment Methodology 3.1 Our System Overview The goal of our query recommendation system is to find the related queries from past queries given an input query. Figure 1 shows a sketch of the system architecture consisting of two components. One is “Web Server(Apache)” where a user interactively communicates with our system. The user inputs a query to the Web server, and then the Web server returns the list of related past queries to her.
  • 5. (1) (2) (3) (4) Google Suggestion bank interest bank comparison bank rank ing bank business hours (Noun space) bank grading bank charge bank transfer charge bank code (C ommunity space) bank transfer bank interest comparison (UR L space) (Noun space) quot; major consumer product mak erquot; quot; M izuho bank quot; quot; exchange ratequot; quot; Sony bank quot; quot; uni cation of the bank squot; quot; Sonyquot; quot; foreign currency depositsquot; (name of n ancial commodity) quot; K oizumi cabinet inaugurationquot; quot; ratequot; quot; megabank quot; (name of nancial commodity ) (name of nancial commodity) quot; depositquot; quot; H itachiquot; quot; M izuhoquot; (bank name) quot; J apanese-E nglish dictionaryquot; quot; restructurequot; quot; pay o quot; quot; Y en ratequot; quot; my car loanquot; (Community space) quot; depositquot; quot; combative sportquot; quot; M izuhoquot; (bank name) quot; exchange ratequot; quot; A sahi bank quot; quot; national athletic meetquot; quot; my car loanquot; quot; bank ruptcy informationquot; quot; welfare xed depositquot; quot; advanced paymentquot; quot; Saltlak equot; (city name) quot; Y amanashi C huo bank quot; quot; totoquot; (football lot) (a name of public lottery) quot; break -even pointquot; quot; Suito shink in bank quot; quot; K umik o E ndoquot; quot; announcerquot; quot; beautyquot; quot; Dai-I chi K angyo bank quot; (UR L space) quot; K umik o E ndoquot; quot; national athletic meetquot; quot; Saltlak equot; (city name) quot; tra ic jam informationquot; quot; T azawa lak equot; (company name) quot; marathonquot; quot; cherry treequot; quot; depositquot; quot; announcerquot; (company name) quot; M izuhoquot; (bank name) quot; combative sportquot; quot; traveler's check quot; quot; cellular phonequot; quot; bulletin boardquot; (company name) quot; K awabequot; quot; cleanness freak quot; quot; winequot; quot; Syunsuk e Nak amuraquot; Fig. 2. The user interface of our system The other is “Query Recommendation Data” storing the recommendation results of the three strategies discussed in Section 2. Our Web logs, Web community, and Web page archive data ensure the richness of information sources to augment Web queries. The interface of our query recommendation system is illustrated in Figure 2. A user can input a search query in Figure 2(1) while the related queries recommended by the component “Query Recommendation Data” are shown below and divided by using different feature spaces. Then, the user can choose one recommended query to add or replace the initial query and submit the reformulated query to a search engine selected from a dropdown list as shown in Figure 2(2). Finally, the search results retrieved by the selected search engine are listed in the right part. Furthermore, in Figure 2(3), there are two slide bars which can adjust the lower and upper bounds of relatedness scores. For each feature space, the maximal number of recommended queries is 20. If the user wants more hints, she can click a button shown in Figure 2(4) to get more recommended queries ordered by their relatedness scores with the initial query. The more the recommended query is related to the initial query, the query is displayed as a deeper red. Figure 2 presents recommendation of the query “Bank” as an example. Since in this study we utilize Japanese Web data, the corresponding English translation is in the bottom of this figure. If some
  • 6. U serID AccessTime R efSec UR L 1 2002/9/30 00:00: 00 4 http://www.tkl.iis.u-tokyo.ac.jp/welcome_j.html 2 2002/9/30 00:00: 00 6 http://www.jma.go.jp/JM A_H P/jma/index.html 3 2002/9/30 00:00: 00 8 http://www.kantei.go.jp/ 4 2002/9/30 00:00: 00 15 http://www.google.co.jp/ 1 2002/9/30 00:00: 04 6 http://www.tkl.iis.u-tokyo.ac.jp/K ilab/W elcome.html 5 2002/9/30 00:00: 04 3 http://www.yahoo.co.jp/ 6 2002/9/30 00:00: 05 54 http://weather.crc.co.jp/ 2 2002/9/30 00:00: 06 11 http://www.data.kishou.go.jp/maiji/ 3 2002/9/30 00:00: 08 34 http://www.kantei.go.jp/new/kousikiyotei.html 5 2002/9/30 00:00: 07 10 http://search.yahoo.co.jp/bin/search? p=% C 5% B 7% B 5% A4 5 2002/9/30 00:00: 10 300 http://www.tkl.iis.u-tokyo.ac.jp/K ilab/Members/members-j.html Fig. 3. A part of our panel logs (Web access logs) queries are only available in Japanese, we give a brief English explanation. For example, the query “Mizuho” is a famous Japanese bank. 3.2 Data Sets Web Access Logs Our Web access logs, also called “panel logs” are provided by Video Research Interactive Inc. which is one of Internet rating companies. The collecting method and statistics of this panel logs are described in [6]. Here we give a brief description. The panel logs consist of user ID, access time of Web page, reference seconds of Web page, URL of accessed Web page and so on. The data size is 10GB and the number of users is about 10 thousand. Figure 3 shows the details of a part of our panel logs. In this study, we need to extract past queries and related information from the whole panel logs. We notice that the URL from a search engine (e.g., Yahoo!) records the query submitted by a user, as shown in Figure 3(a). We extract the query from the URL, and then the access logs followed this URL in a session are corresponding Web pages browsed by the user. The maximum interval to determine the session boundary is 30 minutes, a well-known threshold [2] such that two continuous accesses within 30 minutes interval are regarded as in a same session. Japanese Web Page Archive The large-scale snapshot of a Japanese Web page archive we used was built in February 2002. We crawled 4.5 million Web pages during the panel logs collection period and automatically created 17 hun- dred thousand communities from one million selected pages [7]. Since the time of crawling the Web pages for the Web communities is during the time of panel logs collection, there are some Web pages which are not covered by the crawling due to the change and deletion of pages accessed by the panels. We did a preliminary analysis that the full path URL overlap between the Web access logs and Web page archive is only 18.8%. Therefore, we chopped URLs to their hostnames and then the overlap increases to 65%. 3.3 Evaluation Method We invited nine volunteers(users) to evaluate the query recommendation strate- gies using our system. They are our lab members who usually use search engines
  • 7. Table 1. Search queries for evaluation Test Query Accessed Web Pages Group Test Query Accessed Web Pages Group lottery 891 A bank 113 C ring tone 446 B fishing 64 A movie 226 C scholarship 56 B hot spring 211 A university 50 C soccer 202 B Table 2. Evaluation results of our query recommendation system Relevance of queries Noun space Community space URL space irrelevant 0.037 0.244 0.341 lowly relevant 0.043 0.089 0.107 relevant 0.135 0.131 0.106 highly relevant 0.707 0.480 0.339 relevant and highly relevant 0.843 0.611 0.444 un-judged 0.078 0.056 0.107 to meet their information needs. We also compare these strategies with the query recommendation service supplied by Google search engine. Nine test queries used in our evaluation are listed in Table 1. The total number of queries evaluated by users are 540 because they evaluate the top 20 of recommended queries accord- ing to their relatedness values on each of the three feature spaces 3 . To alleviate the workload on an individual user, we divided the nine users to three group (i.e., A, B, and C) as shown in Table 1. We ask each group to give their rele- vance judgments on three queries. The relevance judgment has five levels, i.e., irrelevant, lowly relevant, relevant, highly relevant, and un-judged. 4 Evaluation Results and Discussions 4.1 Comparisons of Query Recommendation Strategies The evaluation results of the recommended queries related with all search queries are shown in Table 2 where each value denotes the percentage of a relevance judgment level on each feature space. For example, The value with the “highly relevant” judgment using noun space is 0.707 which means the percentage of the “highly relevant” judgment chosen by the users in all recommended queries4 using the noun based strategy. In Table 2, when the noun space based strategy is applied, there are more recommended queries evaluated “highly relevant” and fewer queries evaluated “irrelevant” than the other two feature spaces. Furthermore, if we combine the “highly relevant” judgments and the “relevant” judgments, we still gain the best results in the noun space while the percentage of the recommended queries judged as “irrelevant” using the URL space (i.e., 0.341) is higher than those 3 9 search queries * 20 recommended queries * 3 feature spaces = 540 evaluated queries 4 9 search queries * 20 recommended queries = 180 evaluated queries
  • 8. using other two spaces. In the community space, there are many recommended queries evaluated “highly relevant” and “relevant”. Although the community based strategy produces less related queries than the noun based strategy, it is more than the URL based strategy. By using the URL space, the number of recommended queries judged as “irrelevant” is more than that of queries judged as “highly relevant” (e.g., “irrelevant” vs. “highly relevant” = 0.341 vs. 0.339 in Table 2). In general, the noun and community spaces can supply us with satisfactory recommendation while the URL space based strategy cannot stably produce satisfactory queries related to a query. 4.2 Case Study with “Google Suggestion” We compare the result of our case study in Figure 2 with the query recommen- dation service provided by Google Suggestion. In Figure 2 our system presents some good recommended queries related to bank in the noun and community spaces such as “deposit”, “my car loan”, “toto” and so on that are not given by Google Suggestion. 5 Conclusions and Future Work In this paper, we design a query recommendation system based on three feature spaces, i.e., noun space, URL space, and community space, by using large-scale Web access logs and Japanese Web page archive. The experimental results show that the community-based and noun-based strategies can extract more related search queries than the URL-based strategy. We are designing an optimization algorithm for incremental updates of our recommendation data. References 1. D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, pages 407–416, 2000. 2. L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world-wide web. Computer Networks and ISDN Systems, (27(6)), 1995. 3. P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized query expansion for the web. In SIGIR, pages 7–14, 2007. 4. H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng., 15(4):829–839, 2003. 5. N. S. Glance. Community search assistant. In IUI, pages 91–96, 2001. 6. S. Otsuka, M. Toyoda, J. Hirai, and M. Kitsuregawa. Extracting user behavior by web communities technology on global web logs. In DEXA, pages 957–968, 2004. 7. M. Toyoda and M. Kitsuregawa. Creating a web community chart for navigating related communities. In HT, pages 103–112, 2001. 8. J.-R. Wen, J.-Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59–81, 2002. 9. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR, pages 4–11, 1996. 10. Y. Zhu and L. Gruenwald. Query expansion using web access log files. In DEXA, pages 686–695, 2005.