Geo-tagging & Spatial Indexing 
of Text-Specified 
Data 
Speaker: Shiv Shakti Ghosh 
(shivu@drtc.isibang.ac.in) 
MSLIS – II yr 
DRTC 
INDIAN STATISTICAL INSTITUTE 
BANGALORE 
_
Focus of a GSE 
• Geographic context embedded 
in natural language 
descriptions 
• Place names ambiguous and 
confused with names of 
organizations, people , 
buildings and streets 
• Interpretation of spatial 
relationships(‘near’, ‘north’) 
• Geo-relevance ranking 
Where is ‘A’ located? 
What is happening at ‘B’? 
Shiv Shakti Ghosh 2
Shiv Shakti Ghosh 3
Overview 
1. Geo-tagging 
 Steps for Geo-tagging 
 Comma group Recognition 
 Comma group Resolution 
 Fuzzy Geo-tagging 
 Examples of Geo-Spatial Resources 
2. Spatial Indexing 
 Types 
 Pure text index 
 Spatial primary index 
 Text primary spatio-textual index 
 Text index with spatial post processing 
 R-tree 
3. Related Work 
4. Future Work 
5. References 
Shiv Shakti Ghosh 4
Geo-tagging 
• Enabling the spatial indexing of un-structured or semi-structured 
text 
• Recognizing textual references to locations known as toponyms and 
resolving them 
# Systems using geo-tagging are being constructed for processing text in a 
wide variety of domains, web pages, blogs, encyclopedia articles, news 
articles, tweets, spreadsheets. 
Shiv Shakti Ghosh 5
Steps for geo-tagging 
• Comma group recognition-finding all textual 
references to geographic locations(toponym) 
• Comma group resolution-choosing the correct 
location interpretation for each toponym 
Shiv Shakti Ghosh 6
Comma group Recognition 
• Toponym Recognition 
-Tokenization(Stanford NLP package) 
-POS tagging(Tree tagger package) 
-NER(Stanford NER package) 
* Lat/Long Assignment 
Shiv Shakti Ghosh 7
Comma group Resolution 
1. Prominence location interpretation 
2. Proximity location interpretation 
3. Sibling location interpretation 
Shiv Shakti Ghosh 8
Fuzzy Geo-tagging 
• Final default sense assignment is removed 
• Assignment of weight to each member of the set of possible 
interpretations 
• Summing of weights across all the articles 
• Formation of single toponym with summed weight of 
interpretations 
• Addition of this toponym to previously resolved set of toponyms. 
• Coverage of new set calculated and accordingly toponym is 
accepted or rejected. 
# Process yields satisfactory results only when we are tagging using 
local lexicons. 
Shiv Shakti Ghosh 9
Examples of Geo-Spatial Resources 
• GeoNames: It includes geographical data such as place names in various 
languages, latitude, longitude, altitude and population collected from 
several data sources. It currently contains over 8 millions geographical 
names for around 7 millions unique places. 
Shiv Shakti Ghosh 10
Examples of Geo-Spatial Resources 
Contd.. 
# FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS 
Resource: Geopolitical ontology – Czech_Republic_the 
Shiv Shakti Ghosh 11
Examples of Geo-Spatial Resources 
Contd.. 
Shiv Shakti Ghosh 12
Spatial Indexing 
• Multi-dimensional indices have been proposed for managing spatial 
data, including grid indices, quad-trees, R-trees, kd-trees 
• Hybrid indexing schemes that combine spatial indexing schemes 
built using spatial attribute based on grid scheme and text indexing 
• Geographic context is encoded in 
the form of a document footprint Fd 
• Formation of Query footprint Qd 
• Search engine finds those documents whose footprints intersect 
query footprint 
Shiv Shakti Ghosh 13
Types 
Pure text index 
Spatial primary index 
• Text primary spatio-textual index 
• Text index with spatial post-processing 
• R-tree , Quad tree, K-d tree 
Shiv Shakti Ghosh 14
Pure Text Index 
Shiv Shakti Ghosh 15
Spatial primary index(ST) 
• Each spatial cell 
has a text index. 
• Retrieve document 
ids for query terms 
lying in cells 
intersected by 
query footprint. 
Shiv Shakti Ghosh 16
Text primary spatio-textual index 
bengal {D1,D2,D3,D7,D8,D9,D11,D13 } 
bengal {R1(D1, D7);R2(D3, D11, D13); R3(D2); R4(D8,D9, D11)} 
• For each term , store 
spatial index of documents 
containing the term. 
• For each text query term , 
retrieve ids of documents 
lying in spatial cells 
intersecting the query 
footprint. 
• Cell1(DocList1); 
Cell2(DocList2); …….. 
Shiv Shakti Ghosh 17
Text index with 
spatial post processing 
• Access text index with concept terms. 
• Access spatial index with query footprint. 
• Merge results- find intersection. 
Shiv Shakti Ghosh 18
R-tree 
Shiv Shakti Ghosh 19
Related Work 
• Research project engaged in design of a search engine to find 
documents and datasets relating to places /regions referred in a 
query 
• Funded through the EC Fifth Framework Programme 
• A collaborative effort with six European partners 
• NewsStand monitors RSS feeds from thousands of online news 
sources and retrieves articles within minutes of publication 
• Extracts geographic locations from articles using a custom-built 
geo-tagger, and groups articles into story clusters using a fast 
online clustering algorithm 
• Retrieve stories based on both topical and geographic significance 
• A system for extracting, querying, and visualizing textual 
references to geographic locations in unstructured text documents 
• Use of geo-tagger, GNIS(for US based entities), GNS(for non-US 
based entities) , NLP tools 
Shiv Shakti Ghosh 20
Future Work 
• Application of idea of fuzzy geo-tagging to geo-tagging 
with comma groups 
• To allow looser interpretations of prominence and 
proximity for comma group interpretations 
• Application of distance functions for calculating 
relevance (eg; greater the overlap greater is the 
relevance) 
• To reduce storage cost of spatial indexes 
• Improvement of response time 
• Computational costs associated with having each 
document associated with several different geographic 
scopes 
Shiv Shakti Ghosh 21
References 
1. Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan.(2010) Geotagging: 
Using Proximity, Sibling, and Prominence Clues to Understand Comma Groups 
2. Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan.(2010)Geotagging 
with Local Lexicons to Build Indexes for Textually-Specified Spatial Data 
3. Vaid, Subodh and Jones, Christopher B. and Joho, Hideo and Sanderson, Mark. 
(2011)Spatio-Textual Indexing for Geographical Search on the Web 
4. Martins, Bruno and Silva, Mário J. and Andrade, Leonardo.(2011)Indexing and Ranking in 
GeoIR Systems 
5. Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan and Sperling, 
Jon(2010)STEWARD: Architecture of a Spatio-Textual Search Engine 
6. Teitler, Benjamin E. and Panozzo, Daniele and Lieberman, Michael D. and Samet, Hanan 
and Sankaranayananan, Jagan and Sperling, Jon(2008)NewsStand: A New View on News 
7. GeoNames. http://geonames.org/ Accessed 24 September 2014. 
8. NewsStand. http://newsstand.umiacs.umd.edu/ Accessed 24 September 2014. 
9. STEWARD: Spatio-Textual Search Engine. http://steward.umiacs.umd.edu/ 
Accessed on 24 September 2014. 
10. Czech_Republic_the http://www.fao.org/countryprofiles/geoinfo/geopolitical/resource/ 
Czech_Republic_the Accessed 24 September 2014. 
11. Giunchiglia, Fausto and Maltese, Vincenzo and Farazi, Feroz and Dutta, Biswanath . 
Geowordnet: A Resource for Geo-spatial Applications 
Shiv Shakti Ghosh 22
Thank 
Any Questions 
you

Geo tagging & spatial indexing of text-specified data

  • 1.
    Geo-tagging & SpatialIndexing of Text-Specified Data Speaker: Shiv Shakti Ghosh (shivu@drtc.isibang.ac.in) MSLIS – II yr DRTC INDIAN STATISTICAL INSTITUTE BANGALORE _
  • 2.
    Focus of aGSE • Geographic context embedded in natural language descriptions • Place names ambiguous and confused with names of organizations, people , buildings and streets • Interpretation of spatial relationships(‘near’, ‘north’) • Geo-relevance ranking Where is ‘A’ located? What is happening at ‘B’? Shiv Shakti Ghosh 2
  • 3.
  • 4.
    Overview 1. Geo-tagging  Steps for Geo-tagging  Comma group Recognition  Comma group Resolution  Fuzzy Geo-tagging  Examples of Geo-Spatial Resources 2. Spatial Indexing  Types  Pure text index  Spatial primary index  Text primary spatio-textual index  Text index with spatial post processing  R-tree 3. Related Work 4. Future Work 5. References Shiv Shakti Ghosh 4
  • 5.
    Geo-tagging • Enablingthe spatial indexing of un-structured or semi-structured text • Recognizing textual references to locations known as toponyms and resolving them # Systems using geo-tagging are being constructed for processing text in a wide variety of domains, web pages, blogs, encyclopedia articles, news articles, tweets, spreadsheets. Shiv Shakti Ghosh 5
  • 6.
    Steps for geo-tagging • Comma group recognition-finding all textual references to geographic locations(toponym) • Comma group resolution-choosing the correct location interpretation for each toponym Shiv Shakti Ghosh 6
  • 7.
    Comma group Recognition • Toponym Recognition -Tokenization(Stanford NLP package) -POS tagging(Tree tagger package) -NER(Stanford NER package) * Lat/Long Assignment Shiv Shakti Ghosh 7
  • 8.
    Comma group Resolution 1. Prominence location interpretation 2. Proximity location interpretation 3. Sibling location interpretation Shiv Shakti Ghosh 8
  • 9.
    Fuzzy Geo-tagging •Final default sense assignment is removed • Assignment of weight to each member of the set of possible interpretations • Summing of weights across all the articles • Formation of single toponym with summed weight of interpretations • Addition of this toponym to previously resolved set of toponyms. • Coverage of new set calculated and accordingly toponym is accepted or rejected. # Process yields satisfactory results only when we are tagging using local lexicons. Shiv Shakti Ghosh 9
  • 10.
    Examples of Geo-SpatialResources • GeoNames: It includes geographical data such as place names in various languages, latitude, longitude, altitude and population collected from several data sources. It currently contains over 8 millions geographical names for around 7 millions unique places. Shiv Shakti Ghosh 10
  • 11.
    Examples of Geo-SpatialResources Contd.. # FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS Resource: Geopolitical ontology – Czech_Republic_the Shiv Shakti Ghosh 11
  • 12.
    Examples of Geo-SpatialResources Contd.. Shiv Shakti Ghosh 12
  • 13.
    Spatial Indexing •Multi-dimensional indices have been proposed for managing spatial data, including grid indices, quad-trees, R-trees, kd-trees • Hybrid indexing schemes that combine spatial indexing schemes built using spatial attribute based on grid scheme and text indexing • Geographic context is encoded in the form of a document footprint Fd • Formation of Query footprint Qd • Search engine finds those documents whose footprints intersect query footprint Shiv Shakti Ghosh 13
  • 14.
    Types Pure textindex Spatial primary index • Text primary spatio-textual index • Text index with spatial post-processing • R-tree , Quad tree, K-d tree Shiv Shakti Ghosh 14
  • 15.
    Pure Text Index Shiv Shakti Ghosh 15
  • 16.
    Spatial primary index(ST) • Each spatial cell has a text index. • Retrieve document ids for query terms lying in cells intersected by query footprint. Shiv Shakti Ghosh 16
  • 17.
    Text primary spatio-textualindex bengal {D1,D2,D3,D7,D8,D9,D11,D13 } bengal {R1(D1, D7);R2(D3, D11, D13); R3(D2); R4(D8,D9, D11)} • For each term , store spatial index of documents containing the term. • For each text query term , retrieve ids of documents lying in spatial cells intersecting the query footprint. • Cell1(DocList1); Cell2(DocList2); …….. Shiv Shakti Ghosh 17
  • 18.
    Text index with spatial post processing • Access text index with concept terms. • Access spatial index with query footprint. • Merge results- find intersection. Shiv Shakti Ghosh 18
  • 19.
  • 20.
    Related Work •Research project engaged in design of a search engine to find documents and datasets relating to places /regions referred in a query • Funded through the EC Fifth Framework Programme • A collaborative effort with six European partners • NewsStand monitors RSS feeds from thousands of online news sources and retrieves articles within minutes of publication • Extracts geographic locations from articles using a custom-built geo-tagger, and groups articles into story clusters using a fast online clustering algorithm • Retrieve stories based on both topical and geographic significance • A system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents • Use of geo-tagger, GNIS(for US based entities), GNS(for non-US based entities) , NLP tools Shiv Shakti Ghosh 20
  • 21.
    Future Work •Application of idea of fuzzy geo-tagging to geo-tagging with comma groups • To allow looser interpretations of prominence and proximity for comma group interpretations • Application of distance functions for calculating relevance (eg; greater the overlap greater is the relevance) • To reduce storage cost of spatial indexes • Improvement of response time • Computational costs associated with having each document associated with several different geographic scopes Shiv Shakti Ghosh 21
  • 22.
    References 1. Lieberman,Michael D. and Samet, Hanan and Sankaranayananan, Jagan.(2010) Geotagging: Using Proximity, Sibling, and Prominence Clues to Understand Comma Groups 2. Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan.(2010)Geotagging with Local Lexicons to Build Indexes for Textually-Specified Spatial Data 3. Vaid, Subodh and Jones, Christopher B. and Joho, Hideo and Sanderson, Mark. (2011)Spatio-Textual Indexing for Geographical Search on the Web 4. Martins, Bruno and Silva, Mário J. and Andrade, Leonardo.(2011)Indexing and Ranking in GeoIR Systems 5. Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan and Sperling, Jon(2010)STEWARD: Architecture of a Spatio-Textual Search Engine 6. Teitler, Benjamin E. and Panozzo, Daniele and Lieberman, Michael D. and Samet, Hanan and Sankaranayananan, Jagan and Sperling, Jon(2008)NewsStand: A New View on News 7. GeoNames. http://geonames.org/ Accessed 24 September 2014. 8. NewsStand. http://newsstand.umiacs.umd.edu/ Accessed 24 September 2014. 9. STEWARD: Spatio-Textual Search Engine. http://steward.umiacs.umd.edu/ Accessed on 24 September 2014. 10. Czech_Republic_the http://www.fao.org/countryprofiles/geoinfo/geopolitical/resource/ Czech_Republic_the Accessed 24 September 2014. 11. Giunchiglia, Fausto and Maltese, Vincenzo and Farazi, Feroz and Dutta, Biswanath . Geowordnet: A Resource for Geo-spatial Applications Shiv Shakti Ghosh 22
  • 23.

Editor's Notes

  • #4 To find other documents that are related to query string by spatial proximity.