Ontology learning

Ehsan Asgarian
Ontology Learning from Text

Definition of Ontology
‘A formal, explicit specification of a shared conceptualization’
must be
machine
understandable
types of concepts and
constraints must be clearly
defined
not private to some individual,
but accepted by a group
an abstract model of some
phenomenon in the world formed
by identifying the relevant
concepts of that phenomenon
or simply, a data model describing of a domain.

Main elements of an ontology
Hierarchy of concepts
(is-a relations)
Object property
(relation)
domain range
domain
xsd:string
range
datatype property
(attribute)
hasTitle
wasWrittenBy

The spectrum of ontology kinds.

Applications of Ontologies
 Knowledge representation and knowledge
management systems
 Intelligent query-answering systems
 Information retrieval and extraction
 Semantic Web
• Web pages annotated with ontologies
• User queries for Web pages analysed at
knowledge level and answered by inferencing on
ontological knowledge

Definition of Ontology Learning
 The application of a set of methods and
techniques used for building an ontology from
scratch
 Uses distributed and heterogeneous
knowledge and information sources
 Allows a reduction in the time and effort
needed in the ontology development process

Task: automatic ontology
extraction from domain texts
Ontology
extraction
texts
ontology

Ontology Learning (Construction)
 Manual construction
• Corpus is not necessary
• Small scale
 Automatic or semiautomatic construction
• Domain specific corpus
• Good domain knowledge coverage

Ontology Learning methods from…
 Unstructured sources
• Involves NLP techniques, morphological and syntactic
analysis, etc.
 Semi-structured source
• elicit an ontology from sources that have some predefined
structure, such as XML Schema
 Structured data
• Extracting concepts and relations from knowledge contained
in structured data, such as databases

Ontology Learning ‘Layer Cake’
Axioms & Rules
Relations
Taxonomy (Concept hierarchies)
Concepts
Synonyms
Termsdisease, illness, hospital
{disease, illness}
Disease:=<I, E, L>
is_a (Doctor, Person)
cure (domain:Doctor, range:Disease)
x, y (sufferFrom(x, y)  ill(x))

An overview of the outputs, tasks, and
common techniques for ontology learning

Subtasks in ontology learning
 Extract the relevant domain terminology and synonyms from a
text collection
 Discover concepts which can be regarded as abstractions of
human thought
 Derive a concept hierarchy organizing these concepts
 Extend an existing concept hierarchy with new concepts
 Learn non-taxonomic relations between concepts
 Populate the ontology with instances of relations and concepts
 Discover other axiomatic relationships or rules involving
concepts and relations

Sample (partial) Ontology –
Electronic Voting Domain
 Concepts: person, voter, worker, poll watcher,
location, county, precinct, vote, ballot, machine,
voting machine, manufacturer, etc.
 Attributes: name of person, model of machine, etc.
 Taxonomical relations:
• Voter is a person; precinct is a location; voting
machine is a machine, etc.
 Non-hierarchical relations:
• Voter cast ballot; voter trust machine; county
adopt machine; equipment miscount ballot, etc.

Sample (partial) Ontology –
Electronic Voting Domain

ConceptNet — a practical commonsense reasoning
 Open Mind Common Sense (OMCS) is an artificial intelligence
project based at the Massachusetts Institute of Technology (MIT)
Media Lab whose goal is to build and utilize a large
commonsense knowledge base from the contributions of many
thousands of people across the Web.
 ConceptNet is a multilingual knowledge base, representing
words and phrases that people use and the common-sense
relationships between them.
 Since its founding in 1999, it has
accumulated more than a million
English facts from over 15,000
contributors in addition to knowledge
bases in other languages.

 The knowledge base is a semantic network presently consisting
of over 1.6 million assertions of commonsense knowledge
encompassing the spatial, physical, social, temporal, and
psychological aspects of everyday life.
 It is built from nodes representing concepts, in the form of words
or short phrases of natural language, and labeled relationships
between them. These are the kinds of things computers need to
know to search for information better, answer questions, and
understand people's goals.
 ConceptNet is generated automatically from the 700 000
sentences of the Open Mind Common Sense Project — a World
Wide Web based collaboration with over 14 000 authors.

Challenges in Text Processing
 Unstructured texts
 Ambiguity in English text
• Multiple senses of a word
• Multiple parts of speech – e.g., “like” can occur in 8 PoS:
• Verb: “Fruit flies like banana”
• Noun: “We may not see its like again”
• Adjective: “People of like tastes agree”
• Adverb: “The rate is more like 12 percent”
• Preposition: “Time flies like an arrow”
• etc
 Lack of closed domain of lexical categories
 Noisy texts
 Requirement of very large training text sets
 Lack of standards in text processing

Part 1  Terms Extraction
Axioms & Rules
Relations
Concepts
Synonyms
Termsdisease, illness, hospital

Terms
 Linguistic realizations of domain-specific concepts
 Are the basis of the ontology learning process
 Term extraction implies:
• Linguistic processing  part-of-speech tagging,
morphological analysis, etc.
• Statistical processing  compares the distribution of
terms between corpora

Terms Extraction: Process
 Run a Part-Of-Speech (POS) tagger over the domain
corpus
 Identify possible terms by constructing patterns, such
as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…
 Ignore Names
 Identify only the relevant to the text terms by applying
statistical metrics

Linguistic Analysis: an example
Discourse
Analysis
Dependency Structure
(S)
Dependency Structure
(Phrases)
Phrase Recognition
Morphological Analysis (stemming)
Part of Speech & Semantic Tagging
Tokenization (incl. Named-Entity Rec.)[table] [2005-06-01] [John Smith]
[[the] [large] [table] NP] [[in] [the] [corner] PP]
[table N:ARTIFACT] [table N:furniture]
[work~ing V]
[[the SPEC] [large MOD] [table HEAD] NP]
[[He SUBJ] [booked PRED] [[this] [table HEAD] NP:DOBJ]S]
[[He SUBJ] [booked PRED] [[this] [table HEAD]NP:DOBJ:X1]…]…
[[It SUBJ:X1] [was PRED] still available…]

Statistical Analysis
Statistical metrics used in terms extraction:
2 ( exp)
exp
obs


 Chi-square
Term weighting (TFIDF) ( ) log( )
( )
N
tfidf w tf
df w
 
Mutual Information ( , )
( , )
( ) ( )
P x y
mi x y
P x P y


TFIDF
( ) ( ) log( )
( )
N
tfidf w tf w
df w

tf(w) term frequency (number of words occurrences in a document)
df(w) document frequency (number of documents containing the word
N number of all documents
tfidf(w) relative importance of the word in the document
Most popular weighting schema
The word is more popular when it appears
several times in a document The word is more important if it appears
in less documents

Part 2  Synonyms
Axioms & Rules
Relations
Concepts
Synonyms
Terms
{disease, illness}

Synonyms
 Identification of terms that share
semantics, i.e., potentially refer to the
same concept
 Methods for extracting synonyms
• Based on WordNet
• Latent Semantic Indexing (LSI)

WordNet
 A lexical database for the English language
 Nouns, verbs, adjectives & adverbs are grouped into sets of
synonyms (synsets)
 Synsets are interlinked by means of conceptual-semantic
and lexical relations

Adapting WordNet to specific domain
 Partition the set of synonymy relations defined in WordNet in
three classes:
• Relations irrelevant in the specific domain
• Relations that are relevant but incorrect in the specific
domain
• Relations that are relevant and correct in the specific
domain
 Remove relations from the first two classes and include
relations from the third class
 Rank the rest sets according to their frequency in corpus

Latent Semantic Indexing (LSI)
 LSI is a technique in NLP of analyzing relationships
between a set of documents and the terms they contain
 Uses a term-document matrix which describes the
occurrences of terms in documents – Vector Space Model
Example: doc1 doc2
database X
computer X X
access X

Part 3  Concepts
Axioms & Rules
Relations
Concepts
Synonyms
Terms
Disease:=<I, E, L>

Concepts
Intension, Extension, Lexicon
A term may be indicate a concept if we can define its:
Intension:
Extension:
Lexical realizations:
(in)formal definition of the set of objects that this concept
describes
a set of objects that the definition of this concept
describes (the name of the nearest common ancestor)
the term itself and its multilingual synonyms
Example: a disease is an impairment of health or a condition of abnormal functioning
Example: influenza, cancer, heart disease
Example: disease, illness, maladie

Part 4  Taxonomy Induction
Axioms & Rules
Relations
Concepts
Synonyms
Terms
is_a (Doctor, Person)

Concept Hierarchy Extraction
 With the use of WordNet
 Lexico-syntactic patterns
 Machine Readable Dictionaries
 Co-occurrence Analysis
 Unsupervised hierarchical clustering techniques
 Linguistic-approaches
Basic methods used for taxonomy extraction:

Taxonomy Extraction with WordNet
 Given two terms t1 and t2, check if they stand in a
hypernym relation with regard to WordNet
 Normalize the number of hypernym paths by dividing
by the number of senses of t1
1 2
1 2
1
| ( ( ), ( )) |
( , ) min( ,1)
| ( ) |
paths senses t senses t
isa t t
senses t

path: a sequence of edges connecting the two synsets
Example: - 4 different hypernym paths between synsets ‘country’ and ‘region’
- ‘country’ has 5 senses
value of isa (country, region) = 0.8

Lexico-syntactic patterns - Hearst
 Aim: the acquisition of hyponym lexical relations from text
 Uses a set of predefined lexico-syntactic patterns which
• occur frequently and in many text genres
• indicate the relation of interest
• can be recognized with little or no pre-encoded knowledge
 Principle idea: match these patterns in texts to retrieve
is_a relations
 Precision with respect to WordNet: 55,45%

NPo such as {NP1, NP2,…, (and | or)} NPn
‘Vehicles such as cars, trucks and bikes….’
such NP as {NP,} * { (or | and) } NP
‘Such fruits as oranges, nectarines or apples…’
NP {, NP} * { , } { or | and } other NP
‘Swimming, running, or/and other activities…’
vehicle
car
bike
truck
is-a
is-a is-a
fruit
apple
nectarine
orange
is-a
is-a is-a
is-a
activity
swimmingrunning
is-a

NP { , } including {NP, } * { or | and } NP
‘Injuries, including broken bones, wounds and bruises…’
NP { , } especially {NP, } * { or | and } NP
‘Publications, especially papers and books…’ publication
bookpaper
is-ais-a
injury
bruise
wound
broken bone
is-a
is-a is-a

Machine Readable Dictionaries
 A method for extracting taxonomies which goes back
to the 80’s
 Main idea: exploit the regularity of dictionary entries to
find a suitable hypernym for the defined word
spring “the season between winter and summer and in which
leaves and flowers appear”
Example:
is_a (spring, season)

MRDs: Exceptions
 The hypernym can be preceded by an expression such as ‘a kind of’,
‘a sort of’, or ‘a type of’
 The problem is solved by keeping an exception list with words such as
‘kind’, ‘sort’, ‘type‘ and taking the head of the NP following the
preposition ‘of’
 The word can be defined in terms of a part-of or membership relation
republican : “a member of a political party advocating republicanism”Example:
is_a (republican, political party) part_of (republican, political party)
hornbeam: “a type of tree with a hard wood, sometimes used in hedges”Example:
is_a (hornbeam, tree)

Co-occurrence analysis
 A certain term t1 is more special that a term t2, if
t2 also appears in all the documents in which t1
appears.
( , )
( | )
( )
n x y
P x y
n y

Term x subsumes term y iff P(x | y) 1, where
n(x,y)  the number of documents in which x and y co-occur
n(y)  the number of documents that contain y
Document-based subsumption

Unsupervised hierarchical
clustering techniques
 Unsupervised hierarchical clustering techniques
known from machine learning research
• very noisy as they highly depend on the frequency and
behavior of the terms in the text collection under consideration
• learn concepts at the same time since they also group terms
(the most related to each other)
• can be regarded as abstractions over words and thus, to
some extent, as concepts
 It is unclear which specific relation actually holds
between the involved words.
Semantic_relatedness (cut, knife)
Example:

Linguistic Approaches
 Modifiers typically restrict or narrow down the meaning
of the modified noun.
 Syntactic structure analysis and dependency analysis
words and modiﬁers in syntactic structures (noun/verb/
prepositional/… phrases) are analyzed to discover
potential terms and relations e.g. the head-modiﬁer principle:
the heads of the terms assuming the hypernym role
 In dependency analysis, grammatical relations, such as
subject, object, adjunct, and complement, are used for
determining more complex relations
is_a (international credit card, credit card)
Example:

Extending Concept Hierarchy
with new Concepts
…by adding a new concept at an appropriate position in the existing taxonomy
 Supervised methods:
• classifiers need to be trained which predict membership for every
concept in the existing concept hierarchy.
• need a considerable amount of training data for each concept,
• such approaches do typically not scale to arbitrary large ontologies.
 Unsupervised approaches:
• assume a similarity function which computes a measure of fit between
the new concept and the concepts existing in the ontology.
• rely on an appropriate contextual representation of the different
concepts on the basis of which similarity can be computed.
• the hierarchical structure of the ontology needs to be considered and
somehow integrated into the similarity measure

Part 5  Relations (non-taxonomic)
Axioms & Rules
Relations
Concepts
Synonyms
Terms
cure (domain:Doctor, range:Disease)

Extracting relations (the interactions
between concepts) & attributes
 Specific relations
• Part-of
• Qualia (Formal, Constitutive, Telic, Agentive)
 General relations
• Exploiting linguistic structure
 Attributes

Learning attributes: Introduction
 Attributes  relations with a datatype as range
 Typically expressed in texts using preposition of, the verb have or
genitive constructs, e.g. ‘the color of the car’, ‘the car’s color’, ‘every
car has a color’
 Values of attributes are expressed using copula constructs,
adjectives or expressions specific to the attribute in question, e.g.,
• ‘the car is red’ (copula + value)
• ‘the red car’ (adjective)
• ‘the baby weights 3 kgr’ (specific expressions)

Classification of attributes
To systematize the learning process attributes are classified according to their range

An approach to learning attributes
 Tokenize & part-of-speech tag the corpus
 Apply the following patterns to extract adjective/noun pairs
(w+{DET})? (w+{NN}) + is{VBZ} w + {JJ}
(w+{DET})? w + {JJ} (w+{NN}) +
 These pairs are weighted using conditional probability:
 For each of the adjectives we look up the corresponding
attributes in WordNet
f(n,a): joint frequency of adjective a and noun n
f(n): the frequency of noun n
JJ: adjective DET: determiner
NN: noun VBZ: verb, 3rd person singular present

“meronymy” / “part-of” relations
whole NN[-PL] ‘s POS part NN[-PL]
part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN
Format  type_of_word TAG type_of_word TAG…
NN = Noun NN-PL = Plural Noun
PREP = Preposition POS = Possessive
JJ = Adjective
e.g. …building’s basement…
e.g. …basement of a building… 55% accuracy
Given a “seed” word find parts of that word in a large corpus of text

Qualia structures
The meaning of a lexical element is described in terms of four roles:
Constitutive
Agentive
Formal
Telic
physical properties of a object (e.g., weight, material, parts)
typically a verb denoting an action which brings the object in existence
normally consists in typing information about the object (e.g., hypernym)
the purpose or function of an object either by a verb or by a nominal
Formal: artifact_tool
Constitutive: blade, handle,…
Telic: cut_act
Agentive: make_act
Example:
Qualia structures for knife

Qualia Structures: Learning Approach
 aim: to automatically learn qualia
structures from the WWW
 Based on the idea of matching certain
lexico-syntactic patterns conveying a
standard relation

 Clues: search engine queries
indicating the relation of
interest
 Calculate the weight of a
candidate qualia element e for
the term t using Jaccard
coefficient:
Qualia Structures: Learning Process
Generate Clues
Download Google
Abstracts
POS-tagging
Matching regular
expressions
Statistical Weighting
Word
Weighted QS
( )
( ) ( ) ( )
GoogleHits e t
GoogleHits e GoogleHits t GoogleHits e t

  

Qualia Structure: Patterns (1/2)
Formal Role
Telic Role

Qualia Structure: Patterns (2/2)
Constitutive Role

Relations by syntactic analysis
SubjToClass_PredToSlot_DObjToRange
Maps a subject to the domain, the predicate or verb to a slot or
relation and the object to its range.
Example:
OntoLT
‘The player kicked the ball to the net’
relation: kick (domain: player, range: ball)

Relations by linguistic theory
Example: ‘Joe wrote a letter’
relation: write (subject: Joe, object: letter)
 The subcategorization frame of a word is the number
and kinds of other words that it selects when appearing
in a sentence.
 E.g. identify verbs in text as indicators of a relation
between their arguments (object properties)
Person restrictions of selection
(for the subject and object of the verb “write”)
written-communication

Part 6  Axioms & Rules
Axioms & Rules
Relations
Concepts
Synonyms
Terms
x, y (sufferFrom(x, y)  ill(x)

DIRT
Discovery of Inference Rules from Text
 an unsupervised method for discovering inference rules
from text, such as
X is author of Y  X wrote Y,
X caused Y  Y is blamed on X
X manufactures Y  X’s Y factory
 Is based on the assumption that:
Words that occurred in the same contexts tend to be similar
Distributional Hypothesis

DIRT: Distributional Hypothesis
 Distributional Hypothesis is applied to
dependency tress
 If two paths tend to link the same sets of
words, their meanings are hypothesized to be
similar

DIRT: Dependency trees
 The inference rules
discovered by DIRT are
between paths in
dependency trees
 Are generated by Minipar
parser
 Minipar represents its
grammar as a network where
nodes represent grammatical
categories and links syntactic
relationships A subset of the dependency relations in Minipar output

DIRT: Dependency trees
“John found a solution to the problem”
pcomp
found
a
solution
to
problem
the
John
moddet
subj obj
det
Links represent dependency relationships
Direction: from the head to the modifier
Labels represent types of dependency relations
Each link between two words represents a direct
semantic relationship
Path between “John” and “problem”
N:subj:V  find  V:obj:N  solution  N:to:N
meaning “X finds solution to Y”

DIRT: Paths in Dependency Trees
Connect the prepositional complement directly to the words
modified by the preposition
transformation rule
Each link between two words represent a direct semantic relationship
A path represents indirect semantic relationships between two content words

Evaluation Ontology Learning Techniques
1) Task-based evaluation (improve quality): the ﬁrst
approach evaluates the adequacy of ontologies in the
context of other applications.
2) Corpus-based evaluation : the second approach uses
domain-speciﬁc data sources to determine to what
extent the ontologies are able to cover the
corresponding domain.
3) Criteria-based evaluation : The third approach,
assesses ontologies by determining how well they
adhere to a set of criteria.

Task-based evaluation
 How well an ontology meets their systems’
requirements.
 An ontology designed to improve the performance of
document retrieval  more relevant when the ontology
is used
 the use of ontological relations in the context of speech
recognition  compared with a gold standard
generated by humans.)

Corpus-based evaluation
 methods for evaluating the ‘ﬁt’ between an ontology and
the domain knowledge in the form of text corpora.
 In this approach, natural language processing (e.g.,
latent semantic analysis, clustering) or information
extraction (e.g., named-entity recognition) techniques
are used to analyze the content of the corpus and
identify terms.

Criteria-based evaluation
 the average number of terms that were aggregated to
form a concept in an ontology : This criterion may be used to
realize the perception that the more variants of a term used to form
a concept, the more fully encompassing or complete the concept is.

Other measurement
 Evaluation approaches can also be distinguished by the
layers of an ontology :
• term,
• concept,
• relation
 Evaluations can be performed to assess the :
• correctness at the terminology layer,
• coverage at the conceptual layer,
• wellness at the taxonomy layer,
• adequacy of the non-taxonomic relations.

Ontology Learning Tools
 Text2Onto
• Open source (Java)
• http://ontoware.org/projects/text2onto
 OntoLT
• Open source (Protégé plug-in, Java)
• http://olp.dfki.de/OntoLT/OntoLT.htm
 OntoGen
• Open source (C++, .NET)
• http://www.textmining.net

Text2Onto: Main Features
 Learn primitives independent of a specific KR
language (Probabilistic Ontology Model, POM)
 System calculates a confidence for each learned
object for better user interaction
 Updates the learned knowledge each time the
corpus is changed and avoid processing it by scratch
 Allows for easy
• combination of algorithms,
• execution of algorithms,
• writing new algorithms

Text2Onto: Algorithms used
 Concepts
• Statistical measures, e.g. TFIDF, C-value/NC-value,…
 Subclass_of relations
• Exploits hypernym relations from WordNet
• Hearst patterns
 Mereological relations (part-of)
 General relations: extracts the following syntactic frames:
• Transitive, e.g., love(subj, obj)
• Intransitive + PP-complement, e.g., walk(subj, pp(to))
• Transitive + PP-complement, e.g., hit(subj, obj, pp(with))
 Instance-of
 Equivalence

OntoGen : Techniques used
 Linear Dimensionality Reduction (a.k.a LSI)
• words related to the same topic co-occur together
more often than words related to different topics
• Result: clusters of words each describing one topic
 K-means clustering algorithm
• Partitions the corpus into k clusters so that two
documents within the same cluster are more closely
related than two documents from different clusters

Onto-LT
 A Protégé plug-in with which classes and
relations can be extracted from a linguistic
annotated text collection
 Provides mapping rules that allow for a
mapping between linguistic entities and
class/slots candidates in Protégé

Onto-LT: Mapping rules
HeadNounToClass_ModToSubClass
Maps a head-noun to a class and in combination with its modifier(s)
to one or more sub-class(es)
Maps a linguistic subject to a class, its predicate to a corresponding
slot for this class and the direct object to the “range” of the slot
SubjToClass_PredToSlot_DObjToRange

Framework dimensions, sub-dimensions
and values (Onto. Tools Evaluation)

The Comparison of Ontology Tools

A Summary of the Outputs Supported, Techniques Used, and
Evaluations Performed for the Seven Systems Included

Conclusions
 A detailed methodology that guides the ontology
learning process does not exist
 Only general guidelines are provided
 No complete correspondence between the methods
and the tools
 Methods are based mainly on NLP techniques
complemented with statistical measures
 Tools give only support to perform some of the steps
proposed in different approaches (except Text2Onto)

Ontology learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ontology learning

Similar to Ontology learning (20)

Recently uploaded

Recently uploaded (20)

Ontology learning