This document summarizes a research paper on spatial queries, entity recognition, and disambiguation. It begins with an introduction on the history of search engines and the rise of spatial search engines. It then discusses related work on query processing, entity recognition, and disambiguation. The document outlines the author's proposed approach, which uses natural language processing to recognize locations, types of locations, and spatial relationships in queries. It evaluates the approach based on disambiguation accuracy and comparison to Google Maps results. In conclusion, it discusses how the approach changes the perspective to spatial queries and allows disambiguation, with potential for future work combining geocoding APIs and handling multiple relationships.
2. Table of contents
1- Introduction
2- Query Processing (Related Works)
3- State of the Art
4- Our approach
5- Conclusion
3. Introduction
December 1990 >> First Search engine (W3Catalog) >> Entirely indexed by hand
September 1993 >> WebCrawler >> Finding automatically
…
January 1994 >> Yahoo!
September 1997>> Google
4. Introduction(Spatial Search Engine)
New Sources on the web:
◦ New Search Engines for Images, Videos
◦ New Search engine for geospatial data (Google Maps, Bing Maps)
February 2005 >> Google Maps
December 2010 >> Bing Maps
5. Query Processing
(What is Query Processing?)
Search Engine two major process:
◦ 1- Offline (For crawling and collection data)
◦ 2- Online (Started from user’s query and end with returning the results)
Where is Query Processing?
What is Query Processing brings to us?
6. Query Processing and Related Works
NLP >> Natural Language Processing
ER >> Entity Recognition
Related Works:
◦ Guo et al. (2009) addresses the problem of Named Entity Recognition in Query (NERQ)
◦ …
◦ Dalvi et al.(2014) developed a four step algorithm named Topic-specific Language Model (TLM method)
for doing Entity Recognition and Disambiguation from search queries.
7. Query Processing (State Of The Art)
An Example of two same query by google maps:
1- Intersection of shariati and resalat
2- Intersection of valiasr and enqelab
8. Proposed Approach (Definition)
Spatial Query = Combination of:
◦ 1- Location Name
◦ 2- Location Type
◦ 3- Spatial Relationship
◦ Example : Hospitals around Resalat Square
Based On NLP (ER) We can recognize and tag these types for further processes
9. Proposed Approach (Algorithm)
1- Input Query > Segmentation (Top to Down)
2- Candidate
◦ 2-1 Location Name
◦ 2-2 Location Type
◦ 2-3 Spatial Relationship
3- Validate The Result
◦ 3-1 Check that it is fully understand
◦ 3-2 Check the conceptual criteria
◦ 3-3 Check the logical criteria
4- Returning the result
10. Proposed Approach (Evaluation)
Two kind of evaluations can be possible:
1- Disambiguation:
◦ The average disambiguation for 100 spatial queries: 89.45%
2- According to 100 spatial queries compared to Google Maps
◦ Google Maps : 72
◦ Our Approach : 91
11. Conclusion
Changing the perspective from textual to spatial
Take the spatial relationship into account
◦ Make them answerable in general
◦ Using them for disambiguation
Future Work:
◦ Using the combination of Geocode APIs
◦ Develop more sophisticated algorithm (2 or more spatial relationship)
In this presentation, a new approach on spatial query processing is introduced, as the title mentions also this technique can be used for disambiguation the results.
As you see in the table of contents, First we have brief Introduction then we discuss about Query processing and some related works.
The next topic is The state of the art that include some sample from Google Maps.
And finally we discuss about The proposed method and conclusion
In our modern life, no one can deny the great importance of Internet on our daily life
Internet and web play a major role on wide aspects of our life
Because of the huge amount of websites and data in the internet, a critical need has arouse for Finding the data and resources on the Word Wide Web
The result was the first generation of Search Engines, they are Indexed firstly by hand
And then some mechanisms invented by IT researchers for finding and indexing the data Automatically. As you see in this Slide in December 1990 First Search engine was born and has developed in dramatic manner, that we can see their high performance in Modern Search engine like google
As the Web has developed with its great slope, Some new sources of data came in this space, From images and videos and even spatial data and maps. Because of the great demands for spatial data, and the need for a place to answer user’s related queries, new search engine evolved, Such as Google Maps, Bing Maps, and etc.
These search engines are responsible for many spatial tasks such as finding a places, facilities and even route finding tasks.
As you see in this slide, in February 2005 Google Maps launched and In December 2010 Bing Maps started.
If we consider a simplified search engine, two significant process must existed in it, First the offline process, and second the online process
The offline process is responsible for finding new data, or websites on the web and also tracking the indexed data. And its run always in the backbone of the search engine, transparent to user.
The online process is started from user’s query and ended with returning the results
Query processing is the first step in the online process, to clarify the importance of this step it is worth mentioning if we consider the online process as a process of solving question, then query processing will be understanding the question.
So by this simple example we can imagine how important is this process if it’s not done efficiently, the result would not be appropriate to the user’s demand.
Natural Language processing is one of the challenging scope on computer science, It’s related to all tasks that can automatically done on textual information
For Example, Finding the related topics in textual data, Summarizing the textual data automatically, Tagging the part of speech and some similar tasks are related to this field of study, one of the main parts of NLP is ER or Entity recognition, which is responsible for tagging and classifying a textual data to some pre-defined category. This mechanism, I mean ER, is used first in 2009 by guo for query processing, then it’s work developed by some researchers in order to
Obtain better query processing. In 2014 Dalvi develop a new algorithm for Entity recognition in search engine’s query and use it’s benefit for disambiguation
And limiting the results. But all this afforts are in General Search Engines, and in our approach we try to propose a method for Spatial Search Engines.
In this slide we have two query with similar structure: Intersection of ….
But the results are so different! If we search this places, Resalat or Shariati google can easily found them but google is not processing the query in spatial perspective.
A question will raise, why google answer the second query? This because of a tag that google have in the intersection and in the attached information, we have the intersection of engelab and valiasr… so google could not understand Spatial relationship, and it depends only on it’s textual data… we called this approach textual perspective which is depending on textual info without any further process.
The first question that must be answered is for query processing is, what is the spatial query? A spatial query is a combination of 3 types of information:
Location name, Location type, Spatial Relationship
As an example …
So If we can find and tag the sub-queries into these three type and understand the relation between them we can easily execute the query in a appropriate approach.
As you see in the algorithm, First the Input query segmented from top to down, it means that in the first iteration we have 1 sub query and in iteration continues to word by word segment if the process it’s not interrupted by achieving the requirement. In the next phase we candidate sub queries according to the 3 predefined categories
First we suspect that the subquery is a location name and by the google geocode api we check that, if it return the results we tag that sub query as a location name and in other process we don’t break it into smaller parts. Google Geocode api is available as a web service that take the string input and return an array of information if it existed, in Json or XML format, depends on your request. Any member of this array contain geospatial information (longitude and latitude, bounding box) and also hierarchical address. Location type and Spatial relationship is checked by gazetteer list or predefined dictionary, Location Type dictionary is stored in a hierarchical structure because of it’s intrinsic characteristics. If the subquery has a adequate similarity to these predefined elements, it will tagged. And finally after each iteration we have a validation step, in order to check our tagging are meaningful or not, first we check that the query is fully processed or not, if the query is wholly processed and tagged or each of the unprocessed subqueries is contain only one word, it is fully processed otherwise it must be return for further segmentation. If the query is fully processed, we check the conceptual and logical criteria is a spatial relationship existed among the query, in conceptual criteria we check the spatial relationship is meaningful according the other parts or not, for example if we have intersection relation in our tag list we must have at least 2 location name if we don’t have them it’s not conceptually applicable so we bring the query for other iteration, and finally if the last criteria is passed, we apply the spatial relationship to the list of our data, and return the logical results, for example we have intersection of 2 location name, and google api bring to us 4 different location for each of these location name, so we 16 possible answer, and by applying the intersection analysis in this 16 possible answer we could easily find the result because most of time, the relation is not existed among lots of this possible answers.
We call this disambiguation, because we eliminate the undesirable results. And finally we return the result if the validation phase passed overally.
For evaluation we can consider two elements, first the average disambiguation, which reflects the amount of effectiveness through this approach for obtaining the desirable result
And comparing our approach to the modern and state of the art search engine, which is google maps.
As you see about 89 percent of disambiguation achieved through this method and also for 100 spatial queries this approach answer 91, while google answer 72
In conclusion, for all domain specific search engine, we can model the query in order to obtain better results. We call this changing the perspective.
And also, by this approach we see that more sophisticated queries are answerable and also lots of undesirable results are eliminated.
For future work we suggest that to use combination of geocode api for better performance and also building more sophisticated algorithm which support
More complex queries.
Thank you for listening – and now if there are any questions, I would be pleased to answer them