BEIRA: A geo-semantic clustering   method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc.Copy...
Summary        Background        Concept        System architecture        Evaluation        Conclusions & Future worksCop...
Background – Map service        Target          -      Car navigation or PND (Personal                 Navigation Devices)...
A scenario : A visitor to Nancy        No previous knowledge about        Nancy.          -      Japanese          -      ...
Use cases : Searching POIs on mobile        3 ways to search        Location based search          -      Nearby area     ...
Problem in location based search        Filtering by the specified area        Sometimes results are        numerous      ...
Problem in category based search        Filtering by specific        category        Sometimes results are        numerous...
Problem of keyword based search        Filtering by keyword match        Information awareness                            ...
Problems        Information overload          -      Numerous candidates          -      Millions of POIs in mobile phone ...
Clustering and summarization        Similar concept          -      Web search engine “Vivisimo”          -      Displays ...
Is Vivisimo enough ?        It provides only semantic        (topic) view.          -      With map service          -    ...
BEIRA :Bird’s Eye Information Retrieval Application        Topic based IR through geographic        view.          -      ...
System architecture        POI database          -      Address of POI          -      Text of POI (guide text, reputation...
Implementation        Combinations of GIS and Text mining        toolsCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rig...
Geo-semantic clustering        Geographic clustering doesn’t reflect area topics :        Circular area        Semantic cl...
Geo-semantic clustering        Co-clustering with geographic and        semantic features          -      Geographic featu...
Evaluation : geo-semantic clustering        Dataset : Cafes in Shibuya          -      Text contents : restaurants evaluat...
Results of clustering        Geo-semantic clustering produces non-        circular area according to its topic.        Sem...
Evaluation of clustering        We confirmed geo-semantic        clustering is better than each solo        clustering    ...
Area summarization        Document summarization        Term weighting : ex. TF/IDF          -      The term that occurs m...
Problem of IDF         The simple IDF cannot extract regional         characteristic word           -      According to ID...
Location aware IDF        The geographic distribution of word         -      Term occurrence in the geographic space      ...
Evaluation of location aware IDF         Evaluation measure : Extraction rate of         location names           -      T...
Evaluation of location aware IDF        Evaluation data          -      All words in Shibuya area.          -      Top 1,0...
Conclusions        BEIRA attacks the issues on map        service          -      Information overload          -      Inf...
Thank you for your attention!Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   26 of 26
Upcoming SlideShare
Loading in …5
×

BEIRA: A geo-semantic clustering method for area summary

977 views

Published on

The 8th International Conference on Web Information Systems Engineering (WISE2007)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
977
On SlideShare
0
From Embeds
0
Number of Embeds
93
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BEIRA: A geo-semantic clustering method for area summary

  1. 1. BEIRA: A geo-semantic clustering method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc.Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  2. 2. Summary Background Concept System architecture Evaluation Conclusions & Future worksCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
  3. 3. Background – Map service Target - Car navigation or PND (Personal Navigation Devices) - GPS mobile phone - Web-based Map Service Major functionalities of map service - View maps around current position - Search route to destination - Search favorite POI (Point of Interests)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
  4. 4. A scenario : A visitor to Nancy No previous knowledge about Nancy. - Japanese - A little interest about Art He has a free time. - No plan. - He can’t speak French. - He has a GPS mobile phone. The only available information is from mobile map service. - He’d like to search POIs using the service. - What is a problem ?Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
  5. 5. Use cases : Searching POIs on mobile 3 ways to search Location based search - Nearby area Category based search - “Restaurant” / “Italian” / … - “Public” / “Library” / … Keyword based search - “chocolate cake”, “soccer”, “beautiful”, “calm” , …Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
  6. 6. Problem in location based search Filtering by the specified area Sometimes results are numerous - In central urban area - Broad area is chosen Selection is very hard - UI is limited. (especially on mobile)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
  7. 7. Problem in category based search Filtering by specific category Sometimes results are numerous - When the user doesn’t specify museum park detail category Information awareness - Once the user chose “Museum” category, he can’t find “Place Place Stanislas”. StanislasCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
  8. 8. Problem of keyword based search Filtering by keyword match Information awareness Art nouveau - The users is required to know about the keyword in advance - “Art Nouveau” is good keyword to find Nancy’s features. - But if the user mistakes the keyword Place Stanislas for “Art Deco” the result will be poorCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
  9. 9. Problems Information overload - Numerous candidates - Millions of POIs in mobile phone service Information awareness - Both fixed category and free keyword search have the similar problem. museum park Solution - Reduce the candidates - But keep information awareness - Clustering and summarization of informationCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 9 of 16
  10. 10. Clustering and summarization Similar concept - Web search engine “Vivisimo” - Displays clustering result and their topic of search results - Dynamic category Easy to choose but comprehensive - There are reduced number of candidates but has comprehensive viewCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
  11. 11. Is Vivisimo enough ? It provides only semantic (topic) view. - With map service - Switching between semantic and geographic view will be complicated Can these two views be combined? - Use only map view - Cluster = areaCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 11 of 16
  12. 12. BEIRA :Bird’s Eye Information Retrieval Application Topic based IR through geographic view. - Use AOI (Area of Interest) instead of POI - AOI consists of area(cluster) and its summary (the word list)Area Art Nouveau Summary=word listCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
  13. 13. System architecture POI database - Address of POI - Text of POI (guide text, reputation text etc.) Preprocessing - Geo-coding and Topic vector generation. Geo-semantic clustering and summarization Display AOI Geographic Latitude Longitude preprocessing POI Geo-semantic Geo-semantic AOI database clustering summarization Semantic preprocessing Topic VectorPOI ID Address text Etc… AOI ID Area Polygon SummaryCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 13 of 16
  14. 14. Implementation Combinations of GIS and Text mining toolsCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 14 of 16
  15. 15. Geo-semantic clustering Geographic clustering doesn’t reflect area topics : Circular area Semantic clustering doesn’t consider geographic view : Scattered area Geo-semantic clustering solves these problems Semantic Clustering G/S Clustering Geographic ClusteringCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 15 of 16
  16. 16. Geo-semantic clustering Co-clustering with geographic and semantic features - Geographic feature : latitude, longitude - Semantic feature : large dimension matrix (Latent semantic indexing) G/S ratio R: the combination ratio - R =Geographic bias / Semantic bias *R *1 Geographic Features Semantic Features POI ID Latitude longitude LSI1 LSI2 LSI3 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
  17. 17. Evaluation : geo-semantic clustering Dataset : Cafes in Shibuya - Text contents : restaurants evaluation web site “asku.com” - 272 cafes in the region (Shibuya ward). Correct cluster data - Generated manually - 13 clusters in the region - F measureCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
  18. 18. Results of clustering Geo-semantic clustering produces non- circular area according to its topic. Semantic Geo-semantic Geographic R=1.0E-04 R=1.0E-02 R=1.0E+06Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  19. 19. Evaluation of clustering We confirmed geo-semantic clustering is better than each solo clustering - Intermediate ratio (0.01) is optimal. 0.6 0.5 0.4 MLSA 0.3 Tensor-Kmeans 0.2 0.1 Semantic 1.0E-04 1.0E-02 0 1.0E+00 1.0E+02 Geographic 1.0E+04 1.0E+06Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
  20. 20. Area summarization Document summarization Term weighting : ex. TF/IDF - The term that occurs many times in a document is important (TF term frequency) - The rare term in entire document set is important (IDF inverse document frequency)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
  21. 21. Problem of IDF The simple IDF cannot extract regional characteristic word - According to IDF , “onion” and “wedding” have same weight - “wedding” should be regarded as more important because the area where wedding is held should be biased.z Normal term Place name Area term “onion” “Dogenzaka” “Wedding”IDFIDF 3.08 3.51 3.04K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 21 of 16
  22. 22. Location aware IDF The geographic distribution of word - Term occurrence in the geographic space More condensed is regarded as more important - Measurement : K-value (point distribution analysis method) IDF * K z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 22 of 16
  23. 23. Evaluation of location aware IDF Evaluation measure : Extraction rate of location names - The area characteristic terms has similar distribution with location namez Normal term Place name Area term “onion” “Dogenzaka” “Wedding”IDFIDF 3.08 3.51 3.04K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 23 of 16
  24. 24. Evaluation of location aware IDF Evaluation data - All words in Shibuya area. - Top 1,000 weighted terms Location aware IDF (IDF*K) efficiently extracts location name than conventional ones 30 25 density of location name[%] 20 IDF 15 K IDF*K 10 5 0 1 100 200 300 400 500 600 700 800 900 rankCopyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
  25. 25. Conclusions BEIRA attacks the issues on map service - Information overload - Information awareness Geo-semantic combination of features and processing can be used to make area characteristics view. Future works - Automatic adaptation of G/S ratio - Evaluation on other contents Hokkai Takashima (1850-1931)Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
  26. 26. Thank you for your attention!Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 26 of 26

×