SlideShare a Scribd company logo
1 of 26
Download to read offline
BEIRA: A geo-semantic clustering
   method for area summary




 Osamu Masutani, Hirotoshi Iwasaki
 Denso IT Laboratory, Inc.


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Summary
        Background
        Concept
        System architecture
        Evaluation
        Conclusions & Future works




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   2 of 26
Background – Map service
        Target
          -      Car navigation or PND (Personal
                 Navigation Devices)
          -      GPS mobile phone
          -      Web-based Map Service
        Major functionalities of map
        service
          -      View maps around current position
          -      Search route to destination
          -      Search favorite POI (Point of
                 Interests)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   3 of 16
A scenario : A visitor to Nancy
        No previous knowledge about
        Nancy.
          -      Japanese
          -      A little interest about Art
        He has a free time.
          -      No plan.
          -      He can’t speak French.
          -      He has a GPS mobile phone.
        The only available information is
        from mobile map service.
          -      He’d like to search POIs using the service.
          -      What is a problem ?

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   4 of 16
Use cases : Searching POIs on mobile
        3 ways to search
        Location based search
          -      Nearby area
        Category based search
          -      “Restaurant” / “Italian” / …
          -      “Public” / “Library” / …
        Keyword based search
          -      “chocolate cake”, “soccer”,
                 “beautiful”, “calm” , …

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   5 of 16
Problem in location based search
        Filtering by the specified area
        Sometimes results are
        numerous
          -      In central urban area
          -      Broad area is chosen
        Selection is very hard
          -      UI is limited. (especially on mobile)




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   6 of 16
Problem in category based search
        Filtering by specific
        category
        Sometimes results are
        numerous
          -      When the user doesn’t specify                      museum        park

                 detail category
        Information awareness
          -      Once the user chose “Museum”
                 category, he can’t find “Place                                  Place
                 Stanislas”.                                                     Stanislas




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.      7 of 16
Problem of keyword based search
        Filtering by keyword match
        Information awareness                                       Art nouveau
          -      The users is required to know about
                 the keyword in advance
          -      “Art Nouveau” is good keyword to
                 find Nancy’s features.
          -      But if the user mistakes the keyword
                                                                          Place
                                                                          Stanislas

                 for “Art Deco” the result will be poor




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.    8 of 16
Problems
        Information overload
          -      Numerous candidates
          -      Millions of POIs in mobile phone service
        Information awareness
          -      Both fixed category and free keyword
                 search have the similar problem.

                                                                    museum       park
        Solution
          -      Reduce the candidates
          -      But keep information awareness
          -      Clustering and summarization of
                 information
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.      9 of 16
Clustering and summarization
        Similar concept
          -      Web search engine “Vivisimo”
          -      Displays clustering result and
                 their topic of search results
          -      Dynamic category
        Easy to choose but
        comprehensive
          -      There are reduced number of
                 candidates but has
                 comprehensive view

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   10 of 16
Is Vivisimo enough ?
        It provides only semantic
        (topic) view.
          -      With map service
          -      Switching between semantic and
                 geographic view will be complicated
        Can these two views be
        combined?
          -      Use only map view
          -      Cluster = area


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   11 of 16
BEIRA :Bird’s Eye Information Retrieval Application
        Topic based IR through geographic
        view.
          -      Use AOI (Area of Interest) instead of POI
          -      AOI consists of area(cluster) and its summary
                 (the word list)

Area
                           Art Nouveau




                                          Summary=word list
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   12 of 16
System architecture
        POI database
          -      Address of POI
          -      Text of POI (guide text, reputation text etc.)
        Preprocessing
          -      Geo-coding and Topic vector generation.
        Geo-semantic clustering and summarization
        Display AOI
                          Geographic         Latitude Longitude
                         preprocessing
      POI                                         Geo-semantic               Geo-semantic         AOI
    database                                       clustering            summarization
                           Semantic
                         preprocessing        Topic Vector


POI ID        Address         text         Etc…

                                                                    AOI ID    Area Polygon     Summary


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                           13 of 16
Implementation
        Combinations of GIS and Text mining
        tools




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   14 of 16
Geo-semantic clustering
        Geographic clustering doesn’t reflect area topics :
        Circular area
        Semantic clustering doesn’t consider geographic
        view : Scattered area
        Geo-semantic clustering solves these problems
     Semantic Clustering                   G/S Clustering           Geographic Clustering




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.             15 of 16
Geo-semantic clustering
        Co-clustering with geographic and
        semantic features
          -      Geographic feature : latitude, longitude
          -      Semantic feature : large dimension matrix (Latent
                 semantic indexing)

        G/S ratio R: the combination ratio
          -      R =Geographic bias / Semantic bias

                                     *R                             *1
                         Geographic Features           Semantic Features
              POI ID     Latitude       longitude      LSI1         LSI2   LSI3
              ・・・        ・・・            ・・・            ・・・          ・・・    ・・・


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                 16 of 16
Evaluation : geo-semantic clustering
        Dataset : Cafes in Shibuya
          -      Text contents : restaurants evaluation web site
                 “asku.com”
          -      272 cafes in the region (Shibuya ward).
        Correct cluster data
          -      Generated manually
          -      13 clusters in the region
          -      F measure



Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   17 of 16
Results of clustering
        Geo-semantic clustering produces non-
        circular area according to its topic.

        Semantic                          Geo-semantic              Geographic




       R=1.0E-04                              R=1.0E-02             R=1.0E+06
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Evaluation of clustering
        We confirmed geo-semantic
        clustering is better than each solo
        clustering
          -      Intermediate ratio (0.01) is optimal.
                                                                                        0.6


                                                                                        0.5


                                                                                        0.4

                                                                                                                                MLSA
                                                                                        0.3                                     Tensor-Kmeans


                                                                                        0.2


                                                                                        0.1



                                            Semantic                1.0E-04   1.0E-02
                                                                                         0
                                                                                        1.0E+00   1.0E+02
                                                                                                                 Geographic
                                                                                                            1.0E+04   1.0E+06




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                                                 19 of 16
Area summarization
        Document summarization
        Term weighting : ex. TF/IDF
          -      The term that occurs many times in a
                 document is important (TF term
                 frequency)
          -      The rare term in entire document set is
                 important (IDF inverse document
                 frequency)


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   20 of 16
Problem of IDF
         The simple IDF cannot extract regional
         characteristic word
           -      According to IDF , “onion” and “wedding” have same weight
           -      “wedding” should be regarded as more important because the
                  area where wedding is held should be biased.


z          Normal term                     Place name                Area term
           “onion”                         “Dogenzaka”               “Wedding”
IDF




IDF                   3.08                            3.51                3.04
K                     4.41                            54.0                9.93
 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               21 of 16
Location aware IDF
        The geographic distribution of word
         -      Term occurrence in the geographic space
        More condensed is regarded as more important
         -      Measurement : K-value (point distribution analysis method)
        IDF * K


 z            Normal term                     Place name            Area term
              “onion”                         “Dogenzaka”           “Wedding”
 IDF




 IDF                     3.08                            3.51            3.04
 K                       4.41                            54.0            9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               22 of 16
Evaluation of location aware IDF
         Evaluation measure : Extraction rate of
         location names
           -      The area characteristic terms has similar
                  distribution with location name
z          Normal term                     Place name                Area term
           “onion”                         “Dogenzaka”               “Wedding”
IDF




IDF                   3.08                            3.51                3.04
K                     4.41                            54.0                9.93
 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.               23 of 16
Evaluation of location aware IDF
        Evaluation data
          -      All words in Shibuya area.
          -      Top 1,000 weighted terms
        Location aware IDF (IDF*K) efficiently
        extracts location name than
        conventional ones                                                                         30


                                                                                                  25




                                                                    density of location name[%]
                                                                                                  20

                                                                                                                                                                  IDF
                                                                                                  15                                                              K
                                                                                                                                                                  IDF*K

                                                                                                  10


                                                                                                  5


                                                                                                  0
                                                                                                       1   100   200   300   400    500   600   700   800   900
                                                                                                                                rank


Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.                                                                               24 of 16
Conclusions
        BEIRA attacks the issues on map
        service
          -      Information overload
          -      Information awareness
        Geo-semantic combination of features
        and processing can be used to make
        area characteristics view.
        Future works
          -      Automatic adaptation of G/S ratio
          -      Evaluation on other contents
                                                                    Hokkai Takashima
                                                                    (1850-1931)

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.    25 of 16
Thank you for your attention!




Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.   26 of 26

More Related Content

More from Osamu Masutani

TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...Osamu Masutani
 
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?Osamu Masutani
 
コネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステムコネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステムOsamu Masutani
 
R tools for Vsual Studio
R tools for Vsual StudioR tools for Vsual Studio
R tools for Vsual StudioOsamu Masutani
 
Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編Osamu Masutani
 
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...Osamu Masutani
 
Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方Osamu Masutani
 
Traffic simulation based on space syntax
Traffic simulation based on space syntaxTraffic simulation based on space syntax
Traffic simulation based on space syntaxOsamu Masutani
 
C++ AMPを使ってみよう
C++ AMPを使ってみようC++ AMPを使ってみよう
C++ AMPを使ってみようOsamu Masutani
 
Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順Osamu Masutani
 
Hpc server講習会第3回応用編
Hpc server講習会第3回応用編Hpc server講習会第3回応用編
Hpc server講習会第3回応用編Osamu Masutani
 
Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2Osamu Masutani
 
Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編Osamu Masutani
 
A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説Osamu Masutani
 
Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説Osamu Masutani
 
Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data Osamu Masutani
 
BEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーションBEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーションOsamu Masutani
 

More from Osamu Masutani (20)

TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
TOWARD A BETTER IPA EXPERIENCE FOR A CONNECTED VEHICLE BY MEANS OF USAGE PRED...
 
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
Power BI勉強会 #6 Power BI で地理的分析とこまでできる?
 
コネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステムコネクテッドカーの胎動と交通サイバーフィジカルシステム
コネクテッドカーの胎動と交通サイバーフィジカルシステム
 
R tools for Vsual Studio
R tools for Vsual StudioR tools for Vsual Studio
R tools for Vsual Studio
 
Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編Power BI チュートリアル 導入・初級編
Power BI チュートリアル 導入・初級編
 
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
A Sensing Coverage Analysis of a Route Control Method for Vehicular Crowd Sen...
 
Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方Matlab distributed computing serverの使い方
Matlab distributed computing serverの使い方
 
Traffic simulation based on space syntax
Traffic simulation based on space syntaxTraffic simulation based on space syntax
Traffic simulation based on space syntax
 
C++ AMPを使ってみよう
C++ AMPを使ってみようC++ AMPを使ってみよう
C++ AMPを使ってみよう
 
Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順Windows Store アプリをuniversal にして申請する手順
Windows Store アプリをuniversal にして申請する手順
 
Hpc server講習会第3回応用編
Hpc server講習会第3回応用編Hpc server講習会第3回応用編
Hpc server講習会第3回応用編
 
Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2Windows HPC Server 講習会 第1回 導入編 1/2
Windows HPC Server 講習会 第1回 導入編 1/2
 
Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編Windows HPC Server 講習会 第2回 開発編
Windows HPC Server 講習会 第2回 開発編
 
A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説A Multiple Pairs Shortest Path Algorithm 解説
A Multiple Pairs Shortest Path Algorithm 解説
 
Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説Clustering of time series subsequences is meaningless 解説
Clustering of time series subsequences is meaningless 解説
 
Autopoiesis 2
Autopoiesis 2Autopoiesis 2
Autopoiesis 2
 
Autopoiesis 1
Autopoiesis 1Autopoiesis 1
Autopoiesis 1
 
UIMAウマー
UIMAウマーUIMAウマー
UIMAウマー
 
Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data Toward a resilient prediction system for non-uniform traffic data
Toward a resilient prediction system for non-uniform traffic data
 
BEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーションBEIRA -鳥瞰型情報検索アプリケーション
BEIRA -鳥瞰型情報検索アプリケーション
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

BEIRA: A geo-semantic clustering method for area summary

  • 1. BEIRA: A geo-semantic clustering method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc. Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • 2. Summary Background Concept System architecture Evaluation Conclusions & Future works Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
  • 3. Background – Map service Target - Car navigation or PND (Personal Navigation Devices) - GPS mobile phone - Web-based Map Service Major functionalities of map service - View maps around current position - Search route to destination - Search favorite POI (Point of Interests) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
  • 4. A scenario : A visitor to Nancy No previous knowledge about Nancy. - Japanese - A little interest about Art He has a free time. - No plan. - He can’t speak French. - He has a GPS mobile phone. The only available information is from mobile map service. - He’d like to search POIs using the service. - What is a problem ? Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
  • 5. Use cases : Searching POIs on mobile 3 ways to search Location based search - Nearby area Category based search - “Restaurant” / “Italian” / … - “Public” / “Library” / … Keyword based search - “chocolate cake”, “soccer”, “beautiful”, “calm” , … Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
  • 6. Problem in location based search Filtering by the specified area Sometimes results are numerous - In central urban area - Broad area is chosen Selection is very hard - UI is limited. (especially on mobile) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
  • 7. Problem in category based search Filtering by specific category Sometimes results are numerous - When the user doesn’t specify museum park detail category Information awareness - Once the user chose “Museum” category, he can’t find “Place Place Stanislas”. Stanislas Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
  • 8. Problem of keyword based search Filtering by keyword match Information awareness Art nouveau - The users is required to know about the keyword in advance - “Art Nouveau” is good keyword to find Nancy’s features. - But if the user mistakes the keyword Place Stanislas for “Art Deco” the result will be poor Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
  • 9. Problems Information overload - Numerous candidates - Millions of POIs in mobile phone service Information awareness - Both fixed category and free keyword search have the similar problem. museum park Solution - Reduce the candidates - But keep information awareness - Clustering and summarization of information Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 9 of 16
  • 10. Clustering and summarization Similar concept - Web search engine “Vivisimo” - Displays clustering result and their topic of search results - Dynamic category Easy to choose but comprehensive - There are reduced number of candidates but has comprehensive view Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
  • 11. Is Vivisimo enough ? It provides only semantic (topic) view. - With map service - Switching between semantic and geographic view will be complicated Can these two views be combined? - Use only map view - Cluster = area Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 11 of 16
  • 12. BEIRA :Bird’s Eye Information Retrieval Application Topic based IR through geographic view. - Use AOI (Area of Interest) instead of POI - AOI consists of area(cluster) and its summary (the word list) Area Art Nouveau Summary=word list Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
  • 13. System architecture POI database - Address of POI - Text of POI (guide text, reputation text etc.) Preprocessing - Geo-coding and Topic vector generation. Geo-semantic clustering and summarization Display AOI Geographic Latitude Longitude preprocessing POI Geo-semantic Geo-semantic AOI database clustering summarization Semantic preprocessing Topic Vector POI ID Address text Etc… AOI ID Area Polygon Summary Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 13 of 16
  • 14. Implementation Combinations of GIS and Text mining tools Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 14 of 16
  • 15. Geo-semantic clustering Geographic clustering doesn’t reflect area topics : Circular area Semantic clustering doesn’t consider geographic view : Scattered area Geo-semantic clustering solves these problems Semantic Clustering G/S Clustering Geographic Clustering Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 15 of 16
  • 16. Geo-semantic clustering Co-clustering with geographic and semantic features - Geographic feature : latitude, longitude - Semantic feature : large dimension matrix (Latent semantic indexing) G/S ratio R: the combination ratio - R =Geographic bias / Semantic bias *R *1 Geographic Features Semantic Features POI ID Latitude longitude LSI1 LSI2 LSI3 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
  • 17. Evaluation : geo-semantic clustering Dataset : Cafes in Shibuya - Text contents : restaurants evaluation web site “asku.com” - 272 cafes in the region (Shibuya ward). Correct cluster data - Generated manually - 13 clusters in the region - F measure Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
  • 18. Results of clustering Geo-semantic clustering produces non- circular area according to its topic. Semantic Geo-semantic Geographic R=1.0E-04 R=1.0E-02 R=1.0E+06 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
  • 19. Evaluation of clustering We confirmed geo-semantic clustering is better than each solo clustering - Intermediate ratio (0.01) is optimal. 0.6 0.5 0.4 MLSA 0.3 Tensor-Kmeans 0.2 0.1 Semantic 1.0E-04 1.0E-02 0 1.0E+00 1.0E+02 Geographic 1.0E+04 1.0E+06 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
  • 20. Area summarization Document summarization Term weighting : ex. TF/IDF - The term that occurs many times in a document is important (TF term frequency) - The rare term in entire document set is important (IDF inverse document frequency) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
  • 21. Problem of IDF The simple IDF cannot extract regional characteristic word - According to IDF , “onion” and “wedding” have same weight - “wedding” should be regarded as more important because the area where wedding is held should be biased. z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 21 of 16
  • 22. Location aware IDF The geographic distribution of word - Term occurrence in the geographic space More condensed is regarded as more important - Measurement : K-value (point distribution analysis method) IDF * K z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 22 of 16
  • 23. Evaluation of location aware IDF Evaluation measure : Extraction rate of location names - The area characteristic terms has similar distribution with location name z Normal term Place name Area term “onion” “Dogenzaka” “Wedding” IDF IDF 3.08 3.51 3.04 K 4.41 54.0 9.93 Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 23 of 16
  • 24. Evaluation of location aware IDF Evaluation data - All words in Shibuya area. - Top 1,000 weighted terms Location aware IDF (IDF*K) efficiently extracts location name than conventional ones 30 25 density of location name[%] 20 IDF 15 K IDF*K 10 5 0 1 100 200 300 400 500 600 700 800 900 rank Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
  • 25. Conclusions BEIRA attacks the issues on map service - Information overload - Information awareness Geo-semantic combination of features and processing can be used to make area characteristics view. Future works - Automatic adaptation of G/S ratio - Evaluation on other contents Hokkai Takashima (1850-1931) Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
  • 26. Thank you for your attention! Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 26 of 26