Ontology Learning from Text
Ontology construction ‘Layer Cake’
Knowledge representation and knowledge management systems
Subtasks in ontology learning
Most Popular Ontology Learning Tools
2. Definition of Ontology
‘A formal, explicit specification of a shared conceptualization’
must be
machine
understandable
types of concepts and
constraints must be clearly
defined
not private to some individual,
but accepted by a group
an abstract model of some
phenomenon in the world formed
by identifying the relevant
concepts of that phenomenon
or simply, a data model describing of a domain.
3. Main elements of an ontology
Hierarchy of concepts
(is-a relations)
Object property
(relation)
domain range
domain
xsd:string
range
datatype property
(attribute)
hasTitle
wasWrittenBy
5. Applications of Ontologies
Knowledge representation and knowledge
management systems
Intelligent query-answering systems
Information retrieval and extraction
Semantic Web
• Web pages annotated with ontologies
• User queries for Web pages analysed at
knowledge level and answered by inferencing on
ontological knowledge
7. Definition of Ontology Learning
The application of a set of methods and
techniques used for building an ontology from
scratch
Uses distributed and heterogeneous
knowledge and information sources
Allows a reduction in the time and effort
needed in the ontology development process
9. Ontology Learning (Construction)
Manual construction
• Corpus is not necessary
• Small scale
Automatic or semiautomatic construction
• Domain specific corpus
• Good domain knowledge coverage
10. Ontology Learning methods from…
Unstructured sources
• Involves NLP techniques, morphological and syntactic
analysis, etc.
Semi-structured source
• elicit an ontology from sources that have some predefined
structure, such as XML Schema
Structured data
• Extracting concepts and relations from knowledge contained
in structured data, such as databases
12. An overview of the outputs, tasks, and
common techniques for ontology learning
13. Subtasks in ontology learning
Extract the relevant domain terminology and synonyms from a
text collection
Discover concepts which can be regarded as abstractions of
human thought
Derive a concept hierarchy organizing these concepts
Extend an existing concept hierarchy with new concepts
Learn non-taxonomic relations between concepts
Populate the ontology with instances of relations and concepts
Discover other axiomatic relationships or rules involving
concepts and relations
14.
15. Sample (partial) Ontology –
Electronic Voting Domain
Concepts: person, voter, worker, poll watcher,
location, county, precinct, vote, ballot, machine,
voting machine, manufacturer, etc.
Attributes: name of person, model of machine, etc.
Taxonomical relations:
• Voter is a person; precinct is a location; voting
machine is a machine, etc.
Non-hierarchical relations:
• Voter cast ballot; voter trust machine; county
adopt machine; equipment miscount ballot, etc.
17. ConceptNet — a practical commonsense reasoning
Open Mind Common Sense (OMCS) is an artificial intelligence
project based at the Massachusetts Institute of Technology (MIT)
Media Lab whose goal is to build and utilize a large
commonsense knowledge base from the contributions of many
thousands of people across the Web.
ConceptNet is a multilingual knowledge base, representing
words and phrases that people use and the common-sense
relationships between them.
Since its founding in 1999, it has
accumulated more than a million
English facts from over 15,000
contributors in addition to knowledge
bases in other languages.
19. ConceptNet — a practical commonsense reasoning
The knowledge base is a semantic network presently consisting
of over 1.6 million assertions of commonsense knowledge
encompassing the spatial, physical, social, temporal, and
psychological aspects of everyday life.
It is built from nodes representing concepts, in the form of words
or short phrases of natural language, and labeled relationships
between them. These are the kinds of things computers need to
know to search for information better, answer questions, and
understand people's goals.
ConceptNet is generated automatically from the 700 000
sentences of the Open Mind Common Sense Project — a World
Wide Web based collaboration with over 14 000 authors.
21. Challenges in Text Processing
Unstructured texts
Ambiguity in English text
• Multiple senses of a word
• Multiple parts of speech – e.g., “like” can occur in 8 PoS:
• Verb: “Fruit flies like banana”
• Noun: “We may not see its like again”
• Adjective: “People of like tastes agree”
• Adverb: “The rate is more like 12 percent”
• Preposition: “Time flies like an arrow”
• etc
Lack of closed domain of lexical categories
Noisy texts
Requirement of very large training text sets
Lack of standards in text processing
23. Terms
Linguistic realizations of domain-specific concepts
Are the basis of the ontology learning process
Term extraction implies:
• Linguistic processing part-of-speech tagging,
morphological analysis, etc.
• Statistical processing compares the distribution of
terms between corpora
24. Terms Extraction: Process
Run a Part-Of-Speech (POS) tagger over the domain
corpus
Identify possible terms by constructing patterns, such
as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…
Ignore Names
Identify only the relevant to the text terms by applying
statistical metrics
26. Statistical Analysis
Statistical metrics used in terms extraction:
2 ( exp)
exp
obs
Chi-square
Term weighting (TFIDF) ( ) log( )
( )
N
tfidf w tf
df w
Mutual Information ( , )
( , )
( ) ( )
P x y
mi x y
P x P y
27. TFIDF
( ) ( ) log( )
( )
N
tfidf w tf w
df w
tf(w) term frequency (number of words occurrences in a document)
df(w) document frequency (number of documents containing the word
N number of all documents
tfidf(w) relative importance of the word in the document
Most popular weighting schema
The word is more popular when it appears
several times in a document The word is more important if it appears
in less documents
29. Synonyms
Identification of terms that share
semantics, i.e., potentially refer to the
same concept
Methods for extracting synonyms
• Based on WordNet
• Latent Semantic Indexing (LSI)
30. WordNet
A lexical database for the English language
Nouns, verbs, adjectives & adverbs are grouped into sets of
synonyms (synsets)
Synsets are interlinked by means of conceptual-semantic
and lexical relations
31. WordNet
A lexical database for the English language
Nouns, verbs, adjectives & adverbs are grouped into sets of
synonyms (synsets)
Synsets are interlinked by means of conceptual-semantic
and lexical relations
32. Adapting WordNet to specific domain
Partition the set of synonymy relations defined in WordNet in
three classes:
• Relations irrelevant in the specific domain
• Relations that are relevant but incorrect in the specific
domain
• Relations that are relevant and correct in the specific
domain
Remove relations from the first two classes and include
relations from the third class
Rank the rest sets according to their frequency in corpus
33. Latent Semantic Indexing (LSI)
LSI is a technique in NLP of analyzing relationships
between a set of documents and the terms they contain
Uses a term-document matrix which describes the
occurrences of terms in documents – Vector Space Model
Example: doc1 doc2
database X
computer X X
access X
34. Part 3 Concepts
Axioms & Rules
Relations
Taxonomy (Concept hierarchies)
Concepts
Synonyms
Terms
Disease:=<I, E, L>
35. Concepts
Intension, Extension, Lexicon
A term may be indicate a concept if we can define its:
Intension:
Extension:
Lexical realizations:
(in)formal definition of the set of objects that this concept
describes
a set of objects that the definition of this concept
describes (the name of the nearest common ancestor)
the term itself and its multilingual synonyms
Example: a disease is an impairment of health or a condition of abnormal functioning
Example: influenza, cancer, heart disease
Example: disease, illness, maladie
37. Concept Hierarchy Extraction
With the use of WordNet
Lexico-syntactic patterns
Machine Readable Dictionaries
Co-occurrence Analysis
Unsupervised hierarchical clustering techniques
Linguistic-approaches
Basic methods used for taxonomy extraction:
38. Taxonomy Extraction with WordNet
Given two terms t1 and t2, check if they stand in a
hypernym relation with regard to WordNet
Normalize the number of hypernym paths by dividing
by the number of senses of t1
1 2
1 2
1
| ( ( ), ( )) |
( , ) min( ,1)
| ( ) |
paths senses t senses t
isa t t
senses t
path: a sequence of edges connecting the two synsets
Example: - 4 different hypernym paths between synsets ‘country’ and ‘region’
- ‘country’ has 5 senses
value of isa (country, region) = 0.8
39. Lexico-syntactic patterns - Hearst
Aim: the acquisition of hyponym lexical relations from text
Uses a set of predefined lexico-syntactic patterns which
• occur frequently and in many text genres
• indicate the relation of interest
• can be recognized with little or no pre-encoded knowledge
Principle idea: match these patterns in texts to retrieve
is_a relations
Precision with respect to WordNet: 55,45%
40. Lexico-syntactic patterns - Hearst
NPo such as {NP1, NP2,…, (and | or)} NPn
‘Vehicles such as cars, trucks and bikes….’
such NP as {NP,} * { (or | and) } NP
‘Such fruits as oranges, nectarines or apples…’
NP {, NP} * { , } { or | and } other NP
‘Swimming, running, or/and other activities…’
vehicle
car
bike
truck
is-a
is-a is-a
fruit
apple
nectarine
orange
is-a
is-a is-a
is-a
activity
swimmingrunning
is-a
41. NP { , } including {NP, } * { or | and } NP
‘Injuries, including broken bones, wounds and bruises…’
NP { , } especially {NP, } * { or | and } NP
‘Publications, especially papers and books…’ publication
bookpaper
is-ais-a
Lexico-syntactic patterns - Hearst
injury
bruise
wound
broken bone
is-a
is-a is-a
42. Machine Readable Dictionaries
A method for extracting taxonomies which goes back
to the 80’s
Main idea: exploit the regularity of dictionary entries to
find a suitable hypernym for the defined word
spring “the season between winter and summer and in which
leaves and flowers appear”
Example:
is_a (spring, season)
43. MRDs: Exceptions
The hypernym can be preceded by an expression such as ‘a kind of’,
‘a sort of’, or ‘a type of’
The problem is solved by keeping an exception list with words such as
‘kind’, ‘sort’, ‘type‘ and taking the head of the NP following the
preposition ‘of’
The word can be defined in terms of a part-of or membership relation
republican : “a member of a political party advocating republicanism”Example:
is_a (republican, political party) part_of (republican, political party)
hornbeam: “a type of tree with a hard wood, sometimes used in hedges”Example:
is_a (hornbeam, tree)
44. Co-occurrence analysis
A certain term t1 is more special that a term t2, if
t2 also appears in all the documents in which t1
appears.
( , )
( | )
( )
n x y
P x y
n y
Term x subsumes term y iff P(x | y) 1, where
n(x,y) the number of documents in which x and y co-occur
n(y) the number of documents that contain y
Document-based subsumption
45. Unsupervised hierarchical
clustering techniques
Unsupervised hierarchical clustering techniques
known from machine learning research
• very noisy as they highly depend on the frequency and
behavior of the terms in the text collection under consideration
• learn concepts at the same time since they also group terms
(the most related to each other)
• can be regarded as abstractions over words and thus, to
some extent, as concepts
It is unclear which specific relation actually holds
between the involved words.
Semantic_relatedness (cut, knife)
Example:
46. Linguistic Approaches
Modifiers typically restrict or narrow down the meaning
of the modified noun.
Syntactic structure analysis and dependency analysis
words and modifiers in syntactic structures (noun/verb/
prepositional/… phrases) are analyzed to discover
potential terms and relations e.g. the head-modifier principle:
the heads of the terms assuming the hypernym role
In dependency analysis, grammatical relations, such as
subject, object, adjunct, and complement, are used for
determining more complex relations
is_a (international credit card, credit card)
Example:
47. Extending Concept Hierarchy
with new Concepts
…by adding a new concept at an appropriate position in the existing taxonomy
Supervised methods:
• classifiers need to be trained which predict membership for every
concept in the existing concept hierarchy.
• need a considerable amount of training data for each concept,
• such approaches do typically not scale to arbitrary large ontologies.
Unsupervised approaches:
• assume a similarity function which computes a measure of fit between
the new concept and the concepts existing in the ontology.
• rely on an appropriate contextual representation of the different
concepts on the basis of which similarity can be computed.
• the hierarchical structure of the ontology needs to be considered and
somehow integrated into the similarity measure
49. Extracting relations (the interactions
between concepts) & attributes
Specific relations
• Part-of
• Qualia (Formal, Constitutive, Telic, Agentive)
General relations
• Exploiting linguistic structure
Attributes
50. Learning attributes: Introduction
Attributes relations with a datatype as range
Typically expressed in texts using preposition of, the verb have or
genitive constructs, e.g. ‘the color of the car’, ‘the car’s color’, ‘every
car has a color’
Values of attributes are expressed using copula constructs,
adjectives or expressions specific to the attribute in question, e.g.,
• ‘the car is red’ (copula + value)
• ‘the red car’ (adjective)
• ‘the baby weights 3 kgr’ (specific expressions)
52. An approach to learning attributes
Tokenize & part-of-speech tag the corpus
Apply the following patterns to extract adjective/noun pairs
(w+{DET})? (w+{NN}) + is{VBZ} w + {JJ}
(w+{DET})? w + {JJ} (w+{NN}) +
These pairs are weighted using conditional probability:
For each of the adjectives we look up the corresponding
attributes in WordNet
f(n,a): joint frequency of adjective a and noun n
f(n): the frequency of noun n
JJ: adjective DET: determiner
NN: noun VBZ: verb, 3rd person singular present
53. “meronymy” / “part-of” relations
whole NN[-PL] ‘s POS part NN[-PL]
part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN
Format type_of_word TAG type_of_word TAG…
NN = Noun NN-PL = Plural Noun
PREP = Preposition POS = Possessive
JJ = Adjective
e.g. …building’s basement…
e.g. …basement of a building… 55% accuracy
Given a “seed” word find parts of that word in a large corpus of text
54. Qualia structures
The meaning of a lexical element is described in terms of four roles:
Constitutive
Agentive
Formal
Telic
physical properties of a object (e.g., weight, material, parts)
typically a verb denoting an action which brings the object in existence
normally consists in typing information about the object (e.g., hypernym)
the purpose or function of an object either by a verb or by a nominal
Formal: artifact_tool
Constitutive: blade, handle,…
Telic: cut_act
Agentive: make_act
Example:
Qualia structures for knife
55. Qualia Structures: Learning Approach
aim: to automatically learn qualia
structures from the WWW
Based on the idea of matching certain
lexico-syntactic patterns conveying a
standard relation
56. Clues: search engine queries
indicating the relation of
interest
Calculate the weight of a
candidate qualia element e for
the term t using Jaccard
coefficient:
Qualia Structures: Learning Process
Generate Clues
Download Google
Abstracts
POS-tagging
Matching regular
expressions
Statistical Weighting
Word
Weighted QS
( )
( ) ( ) ( )
GoogleHits e t
GoogleHits e GoogleHits t GoogleHits e t
59. Relations by syntactic analysis
SubjToClass_PredToSlot_DObjToRange
Maps a subject to the domain, the predicate or verb to a slot or
relation and the object to its range.
Example:
OntoLT
‘The player kicked the ball to the net’
relation: kick (domain: player, range: ball)
60. Relations by linguistic theory
Example: ‘Joe wrote a letter’
relation: write (subject: Joe, object: letter)
The subcategorization frame of a word is the number
and kinds of other words that it selects when appearing
in a sentence.
E.g. identify verbs in text as indicators of a relation
between their arguments (object properties)
Person restrictions of selection
(for the subject and object of the verb “write”)
written-communication
62. DIRT
Discovery of Inference Rules from Text
an unsupervised method for discovering inference rules
from text, such as
X is author of Y X wrote Y,
X caused Y Y is blamed on X
X manufactures Y X’s Y factory
Is based on the assumption that:
Words that occurred in the same contexts tend to be similar
Distributional Hypothesis
63. DIRT: Distributional Hypothesis
Distributional Hypothesis is applied to
dependency tress
If two paths tend to link the same sets of
words, their meanings are hypothesized to be
similar
64. DIRT: Dependency trees
The inference rules
discovered by DIRT are
between paths in
dependency trees
Are generated by Minipar
parser
Minipar represents its
grammar as a network where
nodes represent grammatical
categories and links syntactic
relationships A subset of the dependency relations in Minipar output
65. DIRT: Dependency trees
“John found a solution to the problem”
pcomp
found
a
solution
to
problem
the
John
moddet
subj obj
det
Links represent dependency relationships
Direction: from the head to the modifier
Labels represent types of dependency relations
Each link between two words represents a direct
semantic relationship
Path between “John” and “problem”
N:subj:V find V:obj:N solution N:to:N
meaning “X finds solution to Y”
66. DIRT: Paths in Dependency Trees
Connect the prepositional complement directly to the words
modified by the preposition
transformation rule
Each link between two words represent a direct semantic relationship
A path represents indirect semantic relationships between two content words
67. Evaluation Ontology Learning Techniques
1) Task-based evaluation (improve quality): the first
approach evaluates the adequacy of ontologies in the
context of other applications.
2) Corpus-based evaluation : the second approach uses
domain-specific data sources to determine to what
extent the ontologies are able to cover the
corresponding domain.
3) Criteria-based evaluation : The third approach,
assesses ontologies by determining how well they
adhere to a set of criteria.
68. Task-based evaluation
How well an ontology meets their systems’
requirements.
An ontology designed to improve the performance of
document retrieval more relevant when the ontology
is used
the use of ontological relations in the context of speech
recognition compared with a gold standard
generated by humans.)
69. Corpus-based evaluation
methods for evaluating the ‘fit’ between an ontology and
the domain knowledge in the form of text corpora.
In this approach, natural language processing (e.g.,
latent semantic analysis, clustering) or information
extraction (e.g., named-entity recognition) techniques
are used to analyze the content of the corpus and
identify terms.
70. Criteria-based evaluation
the average number of terms that were aggregated to
form a concept in an ontology : This criterion may be used to
realize the perception that the more variants of a term used to form
a concept, the more fully encompassing or complete the concept is.
71. Other measurement
Evaluation approaches can also be distinguished by the
layers of an ontology :
• term,
• concept,
• relation
Evaluations can be performed to assess the :
• correctness at the terminology layer,
• coverage at the conceptual layer,
• wellness at the taxonomy layer,
• adequacy of the non-taxonomic relations.
73. Text2Onto: Main Features
Learn primitives independent of a specific KR
language (Probabilistic Ontology Model, POM)
System calculates a confidence for each learned
object for better user interaction
Updates the learned knowledge each time the
corpus is changed and avoid processing it by scratch
Allows for easy
• combination of algorithms,
• execution of algorithms,
• writing new algorithms
74. Text2Onto: Algorithms used
Concepts
• Statistical measures, e.g. TFIDF, C-value/NC-value,…
Subclass_of relations
• Exploits hypernym relations from WordNet
• Hearst patterns
Mereological relations (part-of)
General relations: extracts the following syntactic frames:
• Transitive, e.g., love(subj, obj)
• Intransitive + PP-complement, e.g., walk(subj, pp(to))
• Transitive + PP-complement, e.g., hit(subj, obj, pp(with))
Instance-of
Equivalence
77. OntoGen : Techniques used
Linear Dimensionality Reduction (a.k.a LSI)
• words related to the same topic co-occur together
more often than words related to different topics
• Result: clusters of words each describing one topic
K-means clustering algorithm
• Partitions the corpus into k clusters so that two
documents within the same cluster are more closely
related than two documents from different clusters
79. Onto-LT
A Protégé plug-in with which classes and
relations can be extracted from a linguistic
annotated text collection
Provides mapping rules that allow for a
mapping between linguistic entities and
class/slots candidates in Protégé
80. Onto-LT: Mapping rules
HeadNounToClass_ModToSubClass
Maps a head-noun to a class and in combination with its modifier(s)
to one or more sub-class(es)
Maps a linguistic subject to a class, its predicate to a corresponding
slot for this class and the direct object to the “range” of the slot
SubjToClass_PredToSlot_DObjToRange
86. A Summary of the Outputs Supported, Techniques Used, and
Evaluations Performed for the Seven Systems Included
87. Conclusions
A detailed methodology that guides the ontology
learning process does not exist
Only general guidelines are provided
No complete correspondence between the methods
and the tools
Methods are based mainly on NLP techniques
complemented with statistical measures
Tools give only support to perform some of the steps
proposed in different approaches (except Text2Onto)