This document summarizes Darío Garigliotti's work on constructing a knowledge base of entity-oriented search intents. It introduces key concepts like entities, entity types, RDF tuples, and knowledge bases. It then describes a pipeline approach for building the knowledge base, which involves acquiring refiners from queries, categorizing refiners, discovering intents, and constructing the knowledge base with triples linking intents to entities, categories, and expressing refiners. Evaluation is done on the accuracy of the extracted knowledge base facts. The full knowledge base contains 155k triples describing 31k intent profiles across 581 entity types. Potential applications include leveraging the knowledge base to identify intents in new queries and improving entity cards.
7. Entities
- An entity is an individual or thing, uniquely
identified
- For example:
Henrik Ibsen
Stavanger
- Also:
Pythagorean Theorem
UEFA Champions League
9. Entity types
- A typical property of an entity is the type(s)
- A entity type is a semantic class grouping
multiple entities
(Henrik Ibsen, is a, writer)
(Henrik Ibsen, is a, Norwegian writer)
(Henrik Ibsen, is a, person)
11. Tuples
- We describe entity properties using triples
- Attributes
(Henrik Ibsen, birthdate, 20 March 1828)
- Types
(Henrik Ibsen, is a, writer)
- Relations
(Henrik Ibsen, work, A Doll’s House)
- RDF (Resource Description Framework)
- A way to represent structured knowledge
12. Knowledge bases
- A knowledge base (KB) is a set of tuples
- There are many knowledge bases
- Domain-specific, e.g. GeoNames, DOI, BBCMusic
- Cross-domain, e.g. DBpedia, YAGO, Freebase,
Google Knowledge Graph
- Yes, there are many
14. Search intents and refiners
- Intent: the underlying user need in a search
query
- For example, the intent of booking a hotel room
- Entity-oriented queries
- Refiner: a way to express an intent in an entity-
oriented query
- For example, for booking a hotel room:
"booking", "book", "reservation", "rooms"
15. Towards an understanding
of search intents
- A large proportion of entity-oriented search
queries
- Interest in understanding what those queries ask
for, and how they can be fulfilled
16. A KB of entity-oriented
search intents
1. Intents searched for a type of entities
paris map, sydney map => [city] map
- a
2. Categories assigned to refiners
messi instagram => Website
scandic rooms => Service
henrik ibsen child => Property
- a
3. Multiple refiners expressing an intent
"booking", "book", "make a reservation", "rooms"
- a
17. A KB of entity-oriented
search intents
1. Intents searched for a type of entities
paris map, sydney map => [city] map
- (intent ID, searchedForType, entity type, conf.)
2. Categories assigned to refiners
messi instagram => Website
scandic rooms => Service
henrik ibsen child => Property
- a
3. Multiple refiners expressing an intent
"booking", "book", "make a reservation", "rooms"
- a
18. A KB of entity-oriented
search intents
1. Intents searched for a type of entities
paris map, sydney map => [city] map
- (intent ID, searchedForType, entity type, conf.)
2. Categories assigned to refiners
messi instagram => Website
scandic rooms => Service
henrik ibsen child => Property
- (intent ID, ofCategory, intent category, conf.)
3. Multiple refiners expressing an intent
"booking", "book", "make a reservation", "rooms"
- a
19. A KB of entity-oriented
search intents
1. Intents searched for a type of entities
paris map, sydney map => [city] map
- (intent ID, searchedForType, entity type, conf.)
2. Categories assigned to refiners
messi instagram => Website
scandic rooms => Service
henrik ibsen child => Property
- (intent ID, ofCategory, intent category, conf.)
3. Multiple refiners expressing an intent
"booking", "book", "make a reservation", "rooms"
- (intent ID, expressedBy, refiner, conf.)
22. Our pipeline approach
clarion hotel
clarion hotel airport
clarion hotel spa
clarion hotel booking
casa 400
casa 400 rooms
casa 400 address
casa 400 deals
...
23. Our pipeline approach
clarion hotel
clarion hotel airport
clarion hotel spa
clarion hotel booking
casa 400
casa 400 rooms
casa 400 address
casa 400 deals
clarion hotel airport
casa 400 airport
scandic airport
...
...
24. Our pipeline approach
[hotel] airport
clarion hotel
clarion hotel airport
clarion hotel spa
clarion hotel booking
casa 400
casa 400 rooms
casa 400 address
casa 400 deals
clarion hotel airport
casa 400 airport
scandic airport
...
...
25. Our pipeline approach
[hotel] airport
[hotel] spa
[hotel] booking
...
clarion hotel
clarion hotel airport
clarion hotel spa
clarion hotel booking
casa 400
casa 400 rooms
casa 400 address
casa 400 deals
clarion hotel airport
casa 400 airport
scandic airport
...
...
26. Our pipeline approach
Refiners
acquisition
[hotel] airport
[hotel] spa
[hotel] booking
...
clarion hotel
clarion hotel airport
clarion hotel spa
clarion hotel booking
casa 400
casa 400 rooms
casa 400 address
casa 400 deals
clarion hotel airport
casa 400 airport
scandic airport
...
...
33. Our pipeline approach
Refiners
acquisition
Refiners
categorization
Intents
discovery
[hotel] airport
[hotel] spa
[hotel] booking
...
[hotel] airport: Service
[hotel] address: Property
[hotel] expedia: Website
...
taxi
arrive
Hotel_Arrivingbooking
make a reservation
Hotel_Booking
address
Hotel_Address
Intent ID Predicate Object Confidence
Hotel_Booking searchedForType [hotel] c1
Hotel_Booking ofCategory Service c2
Hotel_Booking expressedBy "booking" c3
Hotel_Booking expressedBy "make a reservation" c4
Hotel_Booking expressedBy "rooms" c5
KB
construction
34. Our pipeline approach
Refiners
acquisition
Refiners
categorization
Intents
discovery
[hotel] airport
[hotel] spa
[hotel] booking
...
[hotel] airport: Service
[hotel] address: Property
[hotel] expedia: Website
...
taxi
arrive
Hotel_Arrivingbooking
make a reservation
Hotel_Booking
address
Hotel_Address
Intent
profile
{ KB
construction
Intent ID Predicate Object Confidence
Hotel_Booking searchedForType [hotel] c1
Hotel_Booking ofCategory Service c2
Hotel_Booking expressedBy "booking" c3
Hotel_Booking expressedBy "make a reservation" c4
Hotel_Booking expressedBy "rooms" c5
35. Evaluation
- Component-level evaluation
- Cross-validation using the human annotations of intent
categories and refiner clusters, for a representative
sample of 50 types
- End-to-end evaluation
- Human judgments about KB facts, for a sample of
additional types, defined w.r.t. confidence intervals
36. Knowledge base
construction
- Application of the pipeline to extract all
quadruples from 581 unseen types
- 155K quadruples, 31K intent profiles
Excerpt of the KB, for intent ID
<aviation.airline-65-customer_service>
37. Results
[0, 0.8652)
[0.8652, 0.8837)
[0.8837, 0.9043)
[0.9043, 0.9319)
[0.9319, 1]
Confidence intervals according to the splitting percentiles
0%
20%
40%
60%
80%
100%
Proportionoftriples
Correct
Incorrect due to OFCATEGORY
Incorrect due to EXPRESSEDBY
38. Application scenarios
- Leveraging knowledge
with levels of confidence
- Identification of search
intents in unseen queries
- Design and functionality
of entity cards