Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BEIRA: A geo-semantic clustering method for area summary
1. BEIRA: A geo-semantic clustering
method for area summary
Osamu Masutani, Hirotoshi Iwasaki
Denso IT Laboratory, Inc.
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
2. Summary
Background
Concept
System architecture
Evaluation
Conclusions & Future works
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
3. Background – Map service
Target
- Car navigation or PND (Personal
Navigation Devices)
- GPS mobile phone
- Web-based Map Service
Major functionalities of map
service
- View maps around current position
- Search route to destination
- Search favorite POI (Point of
Interests)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
4. A scenario : A visitor to Nancy
No previous knowledge about
Nancy.
- Japanese
- A little interest about Art
He has a free time.
- No plan.
- He can’t speak French.
- He has a GPS mobile phone.
The only available information is
from mobile map service.
- He’d like to search POIs using the service.
- What is a problem ?
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
5. Use cases : Searching POIs on mobile
3 ways to search
Location based search
- Nearby area
Category based search
- “Restaurant” / “Italian” / …
- “Public” / “Library” / …
Keyword based search
- “chocolate cake”, “soccer”,
“beautiful”, “calm” , …
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
6. Problem in location based search
Filtering by the specified area
Sometimes results are
numerous
- In central urban area
- Broad area is chosen
Selection is very hard
- UI is limited. (especially on mobile)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
7. Problem in category based search
Filtering by specific
category
Sometimes results are
numerous
- When the user doesn’t specify museum park
detail category
Information awareness
- Once the user chose “Museum”
category, he can’t find “Place Place
Stanislas”. Stanislas
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
8. Problem of keyword based search
Filtering by keyword match
Information awareness Art nouveau
- The users is required to know about
the keyword in advance
- “Art Nouveau” is good keyword to
find Nancy’s features.
- But if the user mistakes the keyword
Place
Stanislas
for “Art Deco” the result will be poor
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
9. Problems
Information overload
- Numerous candidates
- Millions of POIs in mobile phone service
Information awareness
- Both fixed category and free keyword
search have the similar problem.
museum park
Solution
- Reduce the candidates
- But keep information awareness
- Clustering and summarization of
information
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 9 of 16
10. Clustering and summarization
Similar concept
- Web search engine “Vivisimo”
- Displays clustering result and
their topic of search results
- Dynamic category
Easy to choose but
comprehensive
- There are reduced number of
candidates but has
comprehensive view
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
11. Is Vivisimo enough ?
It provides only semantic
(topic) view.
- With map service
- Switching between semantic and
geographic view will be complicated
Can these two views be
combined?
- Use only map view
- Cluster = area
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 11 of 16
12. BEIRA :Bird’s Eye Information Retrieval Application
Topic based IR through geographic
view.
- Use AOI (Area of Interest) instead of POI
- AOI consists of area(cluster) and its summary
(the word list)
Area
Art Nouveau
Summary=word list
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
13. System architecture
POI database
- Address of POI
- Text of POI (guide text, reputation text etc.)
Preprocessing
- Geo-coding and Topic vector generation.
Geo-semantic clustering and summarization
Display AOI
Geographic Latitude Longitude
preprocessing
POI Geo-semantic Geo-semantic AOI
database clustering summarization
Semantic
preprocessing Topic Vector
POI ID Address text Etc…
AOI ID Area Polygon Summary
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 13 of 16
14. Implementation
Combinations of GIS and Text mining
tools
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 14 of 16
15. Geo-semantic clustering
Geographic clustering doesn’t reflect area topics :
Circular area
Semantic clustering doesn’t consider geographic
view : Scattered area
Geo-semantic clustering solves these problems
Semantic Clustering G/S Clustering Geographic Clustering
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 15 of 16
16. Geo-semantic clustering
Co-clustering with geographic and
semantic features
- Geographic feature : latitude, longitude
- Semantic feature : large dimension matrix (Latent
semantic indexing)
G/S ratio R: the combination ratio
- R =Geographic bias / Semantic bias
*R *1
Geographic Features Semantic Features
POI ID Latitude longitude LSI1 LSI2 LSI3
・・・ ・・・ ・・・ ・・・ ・・・ ・・・
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
17. Evaluation : geo-semantic clustering
Dataset : Cafes in Shibuya
- Text contents : restaurants evaluation web site
“asku.com”
- 272 cafes in the region (Shibuya ward).
Correct cluster data
- Generated manually
- 13 clusters in the region
- F measure
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
18. Results of clustering
Geo-semantic clustering produces non-
circular area according to its topic.
Semantic Geo-semantic Geographic
R=1.0E-04 R=1.0E-02 R=1.0E+06
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
19. Evaluation of clustering
We confirmed geo-semantic
clustering is better than each solo
clustering
- Intermediate ratio (0.01) is optimal.
0.6
0.5
0.4
MLSA
0.3 Tensor-Kmeans
0.2
0.1
Semantic 1.0E-04 1.0E-02
0
1.0E+00 1.0E+02
Geographic
1.0E+04 1.0E+06
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
20. Area summarization
Document summarization
Term weighting : ex. TF/IDF
- The term that occurs many times in a
document is important (TF term
frequency)
- The rare term in entire document set is
important (IDF inverse document
frequency)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
21. Problem of IDF
The simple IDF cannot extract regional
characteristic word
- According to IDF , “onion” and “wedding” have same weight
- “wedding” should be regarded as more important because the
area where wedding is held should be biased.
z Normal term Place name Area term
“onion” “Dogenzaka” “Wedding”
IDF
IDF 3.08 3.51 3.04
K 4.41 54.0 9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 21 of 16
22. Location aware IDF
The geographic distribution of word
- Term occurrence in the geographic space
More condensed is regarded as more important
- Measurement : K-value (point distribution analysis method)
IDF * K
z Normal term Place name Area term
“onion” “Dogenzaka” “Wedding”
IDF
IDF 3.08 3.51 3.04
K 4.41 54.0 9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 22 of 16
23. Evaluation of location aware IDF
Evaluation measure : Extraction rate of
location names
- The area characteristic terms has similar
distribution with location name
z Normal term Place name Area term
“onion” “Dogenzaka” “Wedding”
IDF
IDF 3.08 3.51 3.04
K 4.41 54.0 9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 23 of 16
24. Evaluation of location aware IDF
Evaluation data
- All words in Shibuya area.
- Top 1,000 weighted terms
Location aware IDF (IDF*K) efficiently
extracts location name than
conventional ones 30
25
density of location name[%]
20
IDF
15 K
IDF*K
10
5
0
1 100 200 300 400 500 600 700 800 900
rank
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
25. Conclusions
BEIRA attacks the issues on map
service
- Information overload
- Information awareness
Geo-semantic combination of features
and processing can be used to make
area characteristics view.
Future works
- Automatic adaptation of G/S ratio
- Evaluation on other contents
Hokkai Takashima
(1850-1931)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
26. Thank you for your attention!
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 26 of 26