The document discusses geographic information retrieval (GIR). It introduces GIR as a specialized branch of information retrieval that deals with georeferenced information. It describes some general problems in GIR, such as ambiguity of place names and fuzzy geographic boundaries. It also discusses how cognitive models of human understanding of geography can impact GIR. The document then covers techniques for geo-referencing documents using gazetteers and ontologies. It concludes by discussing related projects, evaluation of GIR systems, and a gazetteer server and service developed for UK academia.
A network is a system of interconnected elements, such as edges (lines) and connecting junctions (points), that represent possible routes from one location to another.
Here I explained introduction to the network analysis in GIS.
Spatial databases are used to store geographic information. Querying on such databases are : range queries, nearest neighbor queries and spatial joins. Many indexing techniques are used for faster retrieval of data out of which r-trees are mainly efficient. Other indexing techniques are quad-trees, grid files etc. Spatial data is used in GIS applications.
Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information Systems – Definitions – History of GIS - Components of a GIS – Hardware, Software, Data, People, Methods – Proprietary and open source Software - Types of data – Spatial, Attribute data- types of attributes – scales/ levels of measurements.
A network is a system of interconnected elements, such as edges (lines) and connecting junctions (points), that represent possible routes from one location to another.
Here I explained introduction to the network analysis in GIS.
Spatial databases are used to store geographic information. Querying on such databases are : range queries, nearest neighbor queries and spatial joins. Many indexing techniques are used for faster retrieval of data out of which r-trees are mainly efficient. Other indexing techniques are quad-trees, grid files etc. Spatial data is used in GIS applications.
Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information Systems – Definitions – History of GIS - Components of a GIS – Hardware, Software, Data, People, Methods – Proprietary and open source Software - Types of data – Spatial, Attribute data- types of attributes – scales/ levels of measurements.
Symbology and Classifying data in ARC GISKU Leuven
Right-click the geostatistical layer in the ArcMap table of contents that you want to classify and click Properties.
Click the Symbology tab.
Click Classify.
Click the Method arrow and choose a classification method.
Prepared as part of the IT for Business Intelligence course of MBA @VGSOM, IIT Kharagpur. The tutorial describes how to represent vector data on a map using the open source software QGIS.
Geographic Information Retrieval (GIR) is a branch of Information Retrieval that nowadays is an important field in Library and Information Science, GIS, and Information Systems, as well as Computer Science. This Presentation includes different models and methods for GIR, highlighted issues in GIR systems user interface design, and so on.
Symbology and Classifying data in ARC GISKU Leuven
Right-click the geostatistical layer in the ArcMap table of contents that you want to classify and click Properties.
Click the Symbology tab.
Click Classify.
Click the Method arrow and choose a classification method.
Prepared as part of the IT for Business Intelligence course of MBA @VGSOM, IIT Kharagpur. The tutorial describes how to represent vector data on a map using the open source software QGIS.
Geographic Information Retrieval (GIR) is a branch of Information Retrieval that nowadays is an important field in Library and Information Science, GIS, and Information Systems, as well as Computer Science. This Presentation includes different models and methods for GIR, highlighted issues in GIR systems user interface design, and so on.
Geocoding in Geographic Information SystemsTanner Jessel
Discusses concepts of geocoding for geographic information retrieval (IR) systems and a "geocoding toolkit" for a paper I am working on to assist in creating ontologies and testing them in geographic information retrieval systems.
In this study various techniques for exploratory spatial data analysis are reviewed : spatial autocorrelation, Moran's I statistic, hot spots analysis, spatial lag and spatial error models.
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseArti Parab Academics
Geographic Information and Spatial Database Models and Representations of the real world Geographic Phenomena: Defining geographic phenomena, types of geographic phenomena, Geographic fields, Geographic objects, Boundaries Computer Representations of Geographic Information: Regular tessellations, irregular tessellations, Vector representations, Topology and Spatial relationships, Scale and Resolution, Representation of Geographic fields, Representation of Geographic objects Organizing and Managing Spatial Data The Temporal Dimension
Geography is a spatial science and a 'space' has multiple dimensions to describe its characteristics in terms of the habitat, economy and society of man. Therefore, for practical purposes of spatial data analysis, we need to perform sampling techniques to identify units of survey at a certain level of probability of significance.
Introduction to Geographic Information system and Remote Sensing (RS)chala hailu
A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data
Remote Sensing is Art, science and technology of observing an object, scene or phenomenon by instrument-based techniques without physical contact
Cartography forms the basis of geography, the spatial science in true sense. The philosophers of geography not only described the earth but found a media to show us the reality.
Differentiation between Global and Local Datum from Different aspect Nzar Braim
Differentiation between Global and Local Datum from Different aspect
Spatial professionals are required to deal with an increasingly wide range
of positioning information obtained from various sources including
terrestrial surveying, Global Navigation Satellite System (GNSS)
observations and online GNSS processing services. These positions refer
to a multitude of local, national, and global datums. A clear understanding
of the different coordinate reference systems and datums in use today and
the appropriate transformations between these are therefore essential to
ensure rigorous consideration of reference frame variations to
produce high-quality outcomes in spatial data analysis tasks.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
4. Introduction
● Geographic Information Retrieval can be seen as a specialized branch of traditional
Information Retrieval.
● Information that has relationships to geographic space is called georeferenced
information and frequently used term in Georeferenced Information Retrieval.
● Georeferenced information is used in all kinds of media, Eg :- Structured data like
maps, land surveys, airborne and satellite images and tabulated observations.
● Can also be used by researchers looking for certain area, or requiring particular area
inhabited by certain animals or is affected by an epidemic.
5. Properties of Georeferenced Information:
● Information available in digital libraries and on the Internet is georeferenced,
although mostly it is not denoted in terms of geographic coordinates.
● The geographical location and extension of a place name is often called geographic
footprint and it is given by coordinates ( longitude, latitude ).
● Geographic Information Retrieval requires that place names and phrases that include
direct or indirect references to place names be resolved and translated into footprints
that can be indexed.
6. General Problems in GIR:
Ambiguity/Lack of precision in Place Names:
● Firstly, several places can share the same name, making the place names unique
only within a limited geographic area.
● Secondly, some place names occurring in texts are temporal or cultural conventions
rather than official names, requiring the user to have an understanding of the time,
context or cultural environment the place names are used in to be able to link it to
some geographic location.
● Thirdly, some place names change over time. eg. Banglore to Bengaluru, Calcutta to
Kolkata etc..
● Fourthly, the geographic extension that the place name denotes can be extended,
reduced or changed over time.
7. General Problems in GIR: (contd.)
○ Fifthly, the borders of a location can be fuzzy. (Kashmir?)
○ The same place name can be written differently in different text, either because the
author has misspelled the name or because there are different legal spellings of the
same place name.
Information being fuzzy :
○ About 200 kilometers south of the capital of Russia” . Direction may vary,
distance may vary. In case of South Africa there are 3 capitals which may lead to
ambiguity.
○ Often, people are imprecise in giving geographic direction, using one of the four
general directions north, south, east or west, when the actual direction might be
somewhere in between.
8. Impact of cognitive model on Geographic
Information Retrieval
● Human understanding of the geographic loaction: Procedural and Survey based.
● Survey: Involves looking at maps and geographic location finding.
● Procedural: Involves exploring and navigating through the place so as to get the 'feel'
of it.
● Using procedural method to locate or gain information is particularly difficult as it
contains many phrases involving human ambiguity.
9. Cognitive model (continued)
● 'People link geographic distance with time.': People when talking about going from say
'a' to 'b' have a tendency of using time as a method of asserting distance.eg: It takes
two hours to reach from 'A' to 'B' by car.
● 'Topology and metric distances': People are very good at mentioning topological
aspects pertaining to a place. Like inclusion (eg: names of the topologies in an area.)
or coincidences (eg: this place is at the same place as..)
● 'People have biases towards east-west or north-south direction': People have a very
biased view of the geographical area. And while giving specifics in direction, they seem
to have a vague sense of direction. eg: When asked where is south america w.r.t to
north america. The answer generally is south. While the really it is in the south-east.
10. Geo referencing using the Gazetteers
Gazetteers: A form of index that relates place names to co-ordinates of locations and
extents.
Here we are going to focus on automatic geo-referencing based on the contents of the
documents text alone
In an automated approach most projects have based their approaches to georeferencing
on a combination of place name identification and natural language processing to identify
phrases that modifies the location pointed to by occurrences of place names (“200 km
south of the Moskow”) or that provides georeferences that indicates a geo-reference
without actually mentioning a specific place name (“Rosenborgs homefield”).
11. Geo- referencing (continued)
Gazetteers have three basic components:
The name is the textual designator of a geographic location, the location is the coordinates
of a point, line or area on the earth’s surface pointed to by a name, and the feature type is
the type of location that a name points to
(Forrest, agricultural area, river, inhabited location etc).The location that a place name
refers to (the place names footprint) can be given as a point, a bounding box or a polygon,
all represented by coordinates.
13. Geo- referencing (continued)
Bounding Box:
Gives a better idea of the entire referenced area.
Does not require a lot of data storage.
However it overlaps other areas around it and is inaccurate.
14. Geo-referencing (continued)
Approximated Polygon approach:
Most accurate in terms of referencing.
However takes a lot of data storage space.
The best approach would be to have something in the middle of the polygon and bounded
box approach like a fixed points polygon approach.
15. Searching for Georeferenced Information
Letting the user specify one or more place names in as keywords in a traditional keyword
based query. When parsing the query, the GIR/IR treats the found place names as special
keywords by the GIR/IR system, indicating the geographical scope of the information need
of the user.
e.g: Googling for Restaurants around you?
Letting users specify the geographic constraint to a query by drawing on one or more
maps.
e.g: Google Maps
and what about GPS Apps like "Here and Now", "Google Latitude"?
16. Searching for Georeferenced Information
Typical Queries:
○ Point in Polygon - asking for georeferenced information that contains,
surrounds or refers to a particular geographic point location
○ Region Queries - asking for anything contained in, adjacent to, or overlaps
the region.
○ Distance and Buffer Zone Queries - asking for information within some fixed
distance of a geographic object (point, line, polygon)
○ Path Queries - asking for the presence of a network structure that can be
queried for network traversal information
○ Multimedia Queries - combining multiple geo-referenced information sources
in resolving a query.
17. Related Projects:
SPIRIT:(Spatially-aware information retrieval on the internet) - funded by the EC
Fifth Framework Programme. To improve the search capabilities on the internet by using
geographical and conceptual ontologies to model both vocabulary and the spatial structure
of places for purposes of IR.This ontology, which is envisioned as an extension to traditional
gazetteers and related locations as well as help ranging hits based on geographic
properties.
∙ ontologies that model geographical terminology;
∙ query expansion and relevance ranking procedures based on
the geographical ontologies;
∙ machine learning techniques for the extraction of
geographical context from web documents and for generating
metadata providing spatial context;
∙ a multi-modal user interface providing textual input and
interactive map feedback of the context of retrieved
documents;
∙ spatial indices for web collections
18. Geo-Ontologies
Ontologies relating Geographical Terminology and Spatial Relationships
● Reference to a geographic place: <PL-Name,PL-Type,{(x,y)}>
○ eg: <Charminar, Monument,{(x,y)}>
● Relative Place Reference : <Spatial Relationship,PL-Name, Type,PL-FP>
○ eg: <In, Hyderabad, City, {(x,y)}>
A Query to SPIRIT will contain one or more references to a PL-REF
Geographic content is a set of <Place reference> expressions and the Geometric Footprint
is a function of this set.
Basically Geo Ontologies can be applied in :
1) User's query interpretation: (+ domain specific ontologies) for disambiguation of place
name
2) System query formulation: to generate alternate names and spatially associated names
3) Metadata extraction: to extract info from free text documents to generate foot print(s)
4) Relevance Ranking: potential for geographical relevance ranking (Dominos Pizza? :) )
19. Geo-Ontologies
Ontology"formal, explicit specification of a shared conceptualisation"
20. Geo-Ontologies
● Types of Atomic Queries:
○ A place name
○ An aspatial entity with relation to a place name
○ An aspatial entity with a spatial relation to a place name
○ An aspatial entity with a spatial relation to a place name
○ A place name with spatial relation to a place name
○ A place type with spatial relation to a place name
○ A place type with spatial relation to a place type
● Geo Ontology = Geographic Feature Ontology + Geographic Type Ontology + Spatial
Relation Ontology
21. User evaluation of the spirit prototype gave consistent results with SPIRIT priorities on
innovative features. Yet, users explain a feeling of frustration which highlights that their
requirements are beyond SPIRIT achievements and that there is still more work to be
done in this area.
The last publication on the website dates back to 2005.
22. Relevance
In Information Retrieval, relevance denotes how well a retrieved document or set of
documents meets the information need of the user.
Geographic Information Retrieval is concerned with retrieving documents in response to a
spatially related query. Thus, the ranking of documents by both textual and spatial
relevance have to be considered.
The most common way to return a set of documents obtained from a Web query is by
a ranked list. The search engine attempts to determine which document seems to be the
most relevant to the user and will put it first in the list. In short, every document receives
a score, or distance to the query, and the returned documents are sorted by this score or
distance.
There are situations where the sorting by score may not be the most useful one. When
a more complex query is done, composed of more than one query term or aspect,
documents can also be returned with two or more scores instead of one.
23. For example, the Web search could be for campings in the neighborhood of
Neuschwanstein, and the documents returned ideally have a score for the query
term “camping” and a score for the proximity to Neuschwanstein. This implies that a Web
document resulting from this query can be mapped to a point in the 2-dimensional plane,
where both axes represent a score. The map indicates campings near the castle
Neuschwanstein, which is situated close to Schwangau, with the distance to the castle
on the x-axis and the rating on the y-axis.
24.
Another weakness of our methods lies in the way we treat multiple-footprint documents.
While we assume that a query can have only one footprint (a user is interested in only one
location), documents may have multiple footprints (refer to more than one location).
The method we followed so far in order to calculate the spatial score considers only the
best-matching document footprint. For example, if a user is looking for “airports near
London”, a document that refers to both “Gatwick” and “Stansted” is scored as referring
only to “Gatwick” since it’s the nearest airport of the two. Such a document, however,
should be scored higher than another that refers only to “Gatwick” since it provides more
relevant information. Another thing is , the number of footprints occurring: Gatwick’s
official web-pages should be more important than a web-list of all airports in UK.
25.
For high-quality ranking two things are required. Firstly, we need a good spatial score
between query and document footprints. Secondly, we need a good combination of the
spatial and textual (BM25) scores.
For finding spatial scores, the spatial relationships (distance, containment, and direction)
were converted into numeric values that indicate how close, how much inside, or how
much North-of the relationship between two objects is. Those numeric values were first
attempts at obtaining a score to quantify spatial relationships.
However, certain issues do come up in this method. For example, let us assume three
cities, A, B, and C, where A lies in equal distance (in a Euclidean sense) from B and C. If
C is bigger than B, then the score of B being close to A should be lower than that of C
being close to C. In other words, the distance scores of cities around A may depend on the
context, i.e. which other cities are around A. Also, natural barriers can influence the
concept of proximity. It matters a lot whether a distance of 10 km (as the crow flies) can be
covered by a direct road, or requires a large detour around a mountain range (or a small
road over a mountain pass)
26.
In traditional information retrieval, the separate scores of each document would be
combined into a single score (e.g., by a weighted sum or product) which produces the
ranked list by sorting.
Now, we are going to incorporate two pieces of information into the way that a spatial
document score is calculated:
• The number n of unique footprints in a document.
• The frequencies f_1,…, f_n, of occurrence of the footprints in the document.
Moreover, the total spatial score of a document will be derived from fractional score
contributions of all occurring document footprints.
27.
A simple way of taking into account all document footprints is to define the total spatial
score as a linear combination (e.g. the simple average) of the individual scores of the
footprints:
S = 1/n * (s_1+…+s_n)
where s_i is the score of the ith document footprint in respect to the query
footprint. Incorporating also the frequencies of occurrence f_i, let us define the weight of
a footprint:
tf_i = 1 + log (f_i).
A footprint that occurs in the document only once will get a weight of one, where any extra
occurrences will increase the weight in a log fashion. The total score may be calculated as
S = 1/(tf_1+…+tf_n) * (tf_1*s_1+…+tf_n*s_n),
that is the weighted average of the individual scores.
28.
Considering again the example about “airports near London”, such a scoring function like
the last one would score higher
Gatwick’s official web-page than a web-list of all UK airports. Moreover, it takes into
account more than the best-matched document footprint. The last formula may serve as a
starting point for improving the spatial scoring function.
29. Evaluation:
2 Indicators:
1) Recall = No. of Relevant Docs returned / Total No. of rel. Docs
2) Precission = No. of relevant Docs returned / Total No. of Indexed Docs
Trec has been evaluated using the ISO 9241 standard: based on Effectiveness (can users
find relevant docs?) , Efficiency (resourcs consumed per result) and Satisfaction (User
feedback)
30. Gazetteer Server and Service for UK
Academia - James Reid
Gazetteer :- Geographical dictionary or directory. Serves as reference for information about
places.
● Geographic searching is powerful information retrieval tool, because the results
obtained hereafter are more specific.
● Geographic searching is restricted because Geographic metadata creation is very
resource intensive and the resources having geographic metadata exists only to
names.
● There is no particular mentioning of the geographic footprint i.e. directly. There might
be direct or indirect reference to the place.
Constant change in Geographic metadata:-
● Names of places may vary.
● Names may have changed from time to time.
● Boundaries can be fuzzy.
● Spoken in some context.
31. GeoXwalk is a comprehensive Gazetteer linking vocabulary of
current and historical geographical names to a standard spatial
coding scheme ( longitude, latitude ).
Technically GeoXwalk has basically three components :-
● Gazetteer database to support spatial searches.
● Middleware components to issue spatial/aspatial queries.
● Geo parser to parse non geographically indexed documents
for some place name as reference to it.
32. Gazetteer database
Each geographical feature must include :-
● Feature name.
● Feature type.
● Geometry ( spatial footprints ).
Marking out the places can be done better by using Polygons as opposed to Points.
Explicit relationships can be defined which is of particular use when Gazetteer hold
significant amount of historical data for which geometries doesn't exist.
Middleware components:
Protocols supported by geoXwalk are:-
● ADL Gazetteer protocol
● OGC filter encoding implementation.
This is to translate XML queries to database specific SQL queries.
33. GeoParser
Most data and metadata existing have some sort of geo-reference that is not in format
which will allow it to be easily spatially searched.
One task associated is how non spatially referenced documents could be spatially indexed.
Could be done using a Gazetteer as reference.
Prototype based geo-parser has been implemented that semi automatically identifies place
name in a document and extract a suitable spatial footprint.
The rule based approach takes in account the structure and context in which words occur.
One issue that is faced by GeoXwalk are Map conflation i.e. detecting duplicate entries.
Like a place spoken differently in different language but has a same geographic footprint.
34. Related Projects: GeoVSM
Geographic Vector Space Model: The project integrates coordinate based
geographic indexing with the key-word based vector space model in are presenting
information space. Relevance measures are based on both geographic measures and on
thematic measures which can be combined into one single measure system.
Vector Space Model: One of the most popular models of document space developed
in textual-based information retrieval research. It is an algebraic model for representing
text or graphical documents (and any objects, in general) as vectors of identifiers.
Using a vector space model, the content of each geographic document can be
approximately described by a vector of (content-bearing) terms, which are a combination of
thematic
subjects and place names.
● Documents and queries are represented as vectors. Each dimension corresponds to a
separate term
An information retrieval system stores a representation of a
document collection using a document-by-term matrix, where the element at position (i, j)
corresponds to the frequency of occurrence of term j in the ith document. In the vector
space model, all the objects (terms, documents, queries, concepts, etc) can be similarly
represented as vectors.
● Vector space model is well accepted as an effective approach in modelling thematic
35. However, the vector space model has some serious problems when used for
modeling the geographic subspace.
The geographic space is inherently continuous and cannot be
adequately approximated using a set of place names (which are discrete in nature). if a
document mentions four place names—Pittsburgh, Philadelphia, Harrisburg, and
Hagerstown—the four place names will be treated as four independent dimensions in a
vector space model, whereas in fact, they are points (or regions) in a two-dimensional
geographic space.
Additional concerns of using locational terms as geographic indexes include: ambiguity in
meaning, non-unique place names, place name might change over time, and spelling
variations
36. Geographical Model
● Geographical model of document space is capable of processing arbitrarily complex
spatial queries.
● The most common spatial are believed to be of three types:
1.Point query: Return the geometric object that contains a given query point
2.Region query :Given a region R, find all objects in the collection that intersect R
3.Buffer zone :A buffer query involves two spatial data sets and a distance d. The answer
to this query are pairs of objects, one from each input set, that are within distanced of
each other. For e.g. “find house-power line pairs that are within 50 meters of each
other.”
● Spatial indexing based on coordinates generates persistent indexes for documents,
since it is well defined and is immune from any changes in place names, political
boundaries, and linguistic variations
37. VSM / Geographical model (contd..)
● Disadvantages of using the Geographical model in retrieving geographical
information
-There are considerable amount of geographical information existing in textual forms
that are not easily integrated into geographical model for mapping and spatial
analysis, due to the difficulties of natural language understanding for geo-referencing
text.
-
38.
39. GeoVSM
● Model obtained by combining the advantages of both the geographical model and
vector space model.
● Each document will be indexed both by footprint (in geographical coordinate space)
and by a term vector (in vector space).
● Geographical indexes will only represent the geographical scope of the document,
and term vectors will only represent thematic scope of documents
40.
41.
Assume that any document has a limited geographic scope, GSd, and
a thematic scope, TSd. Similarly, a query on a document collection also has a geographic
scope, GSq and a thematic scope, TSq. The degree of relevance of a document
to a query can be determined by the following measure:
Rel(d, q) = ƒ(SimG(GSd, GSq), SimT(TSd, TSq) ) (1)
where SimG(•) measures the similarity (i.e., the degree of overlapping) between the
geographic scopes of the document and the query; SimT(•) measures the degree of
overlapping between the thematic scopes of the document and the query; and ƒ(*) is a
function for combining relevance measures of geographic dimensions and thematic
dimensions.
42. References
* GeoVSM: An Integrated Retrieval Model for Geographic Information
Guoray Cai
School of Information Sciences and Technology
The Pennsylvania State University
002K Thomas Building, University Park, PA 16802
* http://www.geo-spirit.org/public_deliverables.html
* http://www.geo-spirit.org/publications/SPIRIT_WP5_D17_5201_final.pdf
* http://www.geo-spirit.org/publications/SPIRIT_DeliverableD18_5302_final.pdf
* http://www.geo-spirit.org/publications/GIR_distrib_ranking.pdf
* Distributed Ranking Methods for Geographic Information Retrieval by
Marc van Kreveld Iris Reinbacher Avi Arampatzis Roelof van Zwol