1. Problem Format
SN. #2.
Title Natural Language search tool for Web GIS
Problem Statement To design and develop a Natural language based search system
for searching about attribute/information about
datasets/features in a web GIS using open source tools and
libraries.
Desired Outcome A search box for the Web GIS portal using which a user can
search any information available in the datasets through
natural language search method.
Dataset Will be provided a group of sample dataset
Domain NLP, Search engine, Indexing
Challenge Converting Natural language to a searchable data model,
developing the search functionality using Natural language
inputs
Usage It will enhance user experience since user may be novice to GIS
and can easily interact to get desired information in natural
language
User Citizens, Users who are using Bhuvan
Expected Numbers of Users Millions
Role of User To visit the Bhuvan Applications for getting useful information
for satellite derived data
Technicality NLP models, Indexing methods, Search mechanisms
Available Solutions (if Yes,
reasons for not using them)
No
Domain Expert(s) Anju Bajpai, T. P. Girish Kumar
Elaborated Problem Statement
In this problem, a student is expected to use open source tools and libraries to develop a web based
GIS application for Natural language based search system for searching about attribute/information
contained in datasets/features. For this the student has to :-
Design a search box where a user can type his question in natural language i.e., English in
this case. For ex:-“Which is the longest river in Maharashtra?” As soon as he presses enter
the natural language phrase should be converted to a meaningful Query after removing stop
words, stemming and making a bag of words model and relevant results showing the list of
long rivers in descending order of length with name should be displayed. Also the top ranked
river feature should be zoomed to fit extent of a map shown in the right side of the page(As
shown in figure 1). As soon as the user clicks on the map, it should open a full view GIS page
in a new window with the same Zoom Boundary Extent. Student can use OSM services which
are free as WMS layer for map visualisation.
The following are some of the elements of any question asked by a user:-
1. Location
2. Time
3. Thing
4. People
2. 5. Event
6. Phenomenon
For Example:-
1. Which is the most populated(4) city(1)?
2. Which is the Nearest Temple(3) to my location(1)?
3. Show me the shortest route to the nearest parking to my location(1)?
4. When(2) does COVID-19(5) reach highest number of cases(4) in
Maharashtra(1)?
Additionally a speech to text converter can also be used (optional feature) using tools like
Web Speech API for making it directly recognizing the speech, then converting to natural
language text and then further processing to search for a specific feature.
A Recommendation of similar results can also be provided to the users (desirable feature).
For example, if a user is interested in Taj Mahal, you can also help them discover other
important monuments of historical importance or other monuments located in Agra, or built
during that period/by the same King.
Also a chart can be drawn based on the results selected in the result list(optional feature)
Figure -1
Data Available:- The dataset is vector data i.e., Shapefiles of OSM data of Western India Region
consisting of various maps like
1. Roads
2. Railways
3. POI
4. Buildings
5. Water
3. 6. Waterways
7. Transport
8. Places of Worship
9. Natural
10. Landuse
11. Places
It is further classified into following categories:-
1. Physical
1.1 Highway
1.2 Cycleway
1.3 Tracktype
1.4 Waterway
1.5 Railway
1.6 Aeroway
1.7 Aerialway
1.8 Power
1.9 Man Made
1.10 Leisure
1.11 Amenity
1.12 Shop
1.13 Tourism
1.14 Historic
1.15 Landuse
1.16 Military
1.17 Natural
2. Non Physical
2.1 Route
2.2 Boundary
2.3 Sport
2.4 Abutters
2.5 Accessories
2.6 Properties
2.7 Restrictions
Data can be downloaded from the following URL:-
https://download.geofabrik.de/asia/india.html
More details on this can be found in the following hyperlink:-
https://query2map.toolforge.org/osmhack.php?lat=23&lon=78&name=India
Also, COVID-19 data with date/time and location information has been provided in CSV format.
Evaluation Criteria
Evaluation will be done based on the following criteria
4. 1. Performance of Search Results
2. Details provided about the feature
3. Accuracy of Search
4. Accuracy of Natural Language understanding
5. Interoperability with other sources of Data
6. Additional features will have extra marks
The Students should submit their complete source codes along with the installation procedure so
that same can be replicated for evaluation. Also, a link has to be provided to view their installed
applications. After evaluating the applications on the above criteria, a virtual meet will be held for
top 10 candidates/ teams etc. through Google Meet for final assessment based on the presentation
of their work.
Incremental Problem
1. Integrate the census data along with district
https://download.geofabrik.de/asia/india.html
2. Landuse Integration for NLP query enhancements