BEIRA: A geo-semantic clustering method for area summary
Upcoming SlideShare
Loading in...5
×
 

BEIRA: A geo-semantic clustering method for area summary

on

  • 768 views

The 8th International Conference on Web Information Systems Engineering (WISE2007)

The 8th International Conference on Web Information Systems Engineering (WISE2007)

Statistics

Views

Total Views
768
Views on SlideShare
680
Embed Views
88

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 88

http://www.scoop.it 88

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

BEIRA: A geo-semantic clustering method for area summary BEIRA: A geo-semantic clustering method for area summary Presentation Transcript

  • BEIRA: A geo-semantic clustering method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc.Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • Summary Background Concept System architecture Evaluation Conclusions & Future worksCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
  • Background – Map service Target - Car navigation or PND (Personal Navigation Devices) - GPS mobile phone - Web-based Map Service Major functionalities of map service - View maps around current position - Search route to destination - Search favorite POI (Point of Interests)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
  • A scenario : A visitor to Nancy No previous knowledge about Nancy. - Japanese - A little interest about Art He has a free time. - No plan. - He can’t speak French. - He has a GPS mobile phone. The only available information is from mobile map service. - He’d like to search POIs using the service. - What is a problem ?Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
  • Use cases : Searching POIs on mobile 3 ways to search Location based search - Nearby area Category based search - “Restaurant” / “Italian” / … - “Public” / “Library” / … Keyword based search - “chocolate cake”, “soccer”, “beautiful”, “calm” , …Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
  • Problem in location based search Filtering by the specified area Sometimes results are numerous - In central urban area - Broad area is chosen Selection is very hard - UI is limited. (especially on mobile)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
  • Problem in category based search Filtering by specific category Sometimes results are numerous - When the user doesn’t specify museum park detail category Information awareness - Once the user chose “Museum” category, he can’t find “Place Place Stanislas”. StanislasCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
  • Problem of keyword based search Filtering by keyword match Information awareness Art nouveau - The users is required to know about the keyword in advance - “Art Nouveau” is good keyword to find Nancy’s features. - But if the user mistakes the keyword Place Stanislas for “Art Deco” the result will be poorCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
  • Problems Information overload - Numerous candidates - Millions of POIs in mobile phone service Information awareness - Both fixed category and free keyword search have the similar problem. museum park Solution - Reduce the candidates - But keep information awareness - Clustering and summarization of informationCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 9 of 16
  • Clustering and summarization Similar concept - Web search engine “Vivisimo” - Displays clustering result and their topic of search results - Dynamic category Easy to choose but comprehensive - There are reduced number of candidates but has comprehensive viewCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
  • Is Vivisimo enough ? It provides only semantic (topic) view. - With map service - Switching between semantic and geographic view will be complicated Can these two views be combined? - Use only map view - Cluster = areaCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 11 of 16
  • BEIRA :Bird’s Eye Information Retrieval Application Topic based IR through geographic view. - Use AOI (Area of Interest) instead of POI - AOI consists of area(cluster) and its summary (the word list)Area Art Nouveau Summary=word listCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
  • System architecture POI database - Address of POI - Text of POI (guide text, reputation text etc.) Preprocessing - Geo-coding and Topic vector generation. Geo-semantic clustering and summarization Display AOI Geographic Latitude Longitude preprocessing POI Geo-semantic Geo-semantic AOI database clustering summarization Semantic preprocessing Topic VectorPOI ID Address text Etc… AOI ID Area Polygon SummaryCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 13 of 16
  • Implementation Combinations of GIS and Text mining toolsCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 14 of 16
  • Geo-semantic clustering Geographic clustering doesn’t reflect area topics : Circular area Semantic clustering doesn’t consider geographic view : Scattered area Geo-semantic clustering solves these problems Semantic Clustering G/S Clustering Geographic ClusteringCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 15 of 16
  • Geo-semantic clustering Co-clustering with geographic and semantic features - Geographic feature : latitude, longitude - Semantic feature : large dimension matrix (Latent semantic indexing) G/S ratio R: the combination ratio - R =Geographic bias / Semantic bias *R *1 Geographic Features Semantic Features POI ID Latitude longitude LSI1 LSI2 LSI3 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
  • Evaluation : geo-semantic clustering Dataset : Cafes in Shibuya - Text contents : restaurants evaluation web site “asku.com” - 272 cafes in the region (Shibuya ward). Correct cluster data - Generated manually - 13 clusters in the region - F measureCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
  • Results of clustering Geo-semantic clustering produces non- circular area according to its topic. Semantic Geo-semantic Geographic R=1.0E-04 R=1.0E-02 R=1.0E+06Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • Evaluation of clustering We confirmed geo-semantic clustering is better than each solo clustering - Intermediate ratio (0.01) is optimal. 0.6 0.5 0.4 MLSA 0.3 Tensor-Kmeans 0.2 0.1 Semantic 1.0E-04 1.0E-02 0 1.0E+00 1.0E+02 Geographic 1.0E+04 1.0E+06Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
  • Area summarization Document summarization Term weighting : ex. TF/IDF - The term that occurs many times in a document is important (TF term frequency) - The rare term in entire document set is important (IDF inverse document frequency)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
  • Problem of IDF The simple IDF cannot extract regional characteristic word - According to IDF , “onion” and “wedding” have same weight - “wedding” should be regarded as more important because the area where wedding is held should be biased.z Normal term Place name Area term “onion” “Dogenzaka” “Wedding”IDFIDF 3.08 3.51 3.04K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 21 of 16
  • Location aware IDF The geographic distribution of word - Term occurrence in the geographic space More condensed is regarded as more important - Measurement : K-value (point distribution analysis method) IDF * K z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 22 of 16
  • Evaluation of location aware IDF Evaluation measure : Extraction rate of location names - The area characteristic terms has similar distribution with location namez Normal term Place name Area term “onion” “Dogenzaka” “Wedding”IDFIDF 3.08 3.51 3.04K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 23 of 16
  • Evaluation of location aware IDF Evaluation data - All words in Shibuya area. - Top 1,000 weighted terms Location aware IDF (IDF*K) efficiently extracts location name than conventional ones 30 25 density of location name[%] 20 IDF 15 K IDF*K 10 5 0 1 100 200 300 400 500 600 700 800 900 rankCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
  • Conclusions BEIRA attacks the issues on map service - Information overload - Information awareness Geo-semantic combination of features and processing can be used to make area characteristics view. Future works - Automatic adaptation of G/S ratio - Evaluation on other contents Hokkai Takashima (1850-1931)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
  • Thank you for your attention!Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 26 of 26