SlideShare a Scribd company logo
1 of 88
Download to read offline
Ehsan Asgarian
Ontology Learning from Text
Definition of Ontology
‘A formal, explicit specification of a shared conceptualization’
must be
types of concepts and
constraints must be clearly
not private to some individual,
but accepted by a group
an abstract model of some
phenomenon in the world formed
by identifying the relevant
concepts of that phenomenon
or simply, a data model describing of a domain.
Main elements of an ontology
Hierarchy of concepts
(is-a relations)
Object property
domain range
datatype property
The spectrum of ontology kinds.
Applications of Ontologies
 Knowledge representation and knowledge
management systems
 Intelligent query-answering systems
 Information retrieval and extraction
 Semantic Web
• Web pages annotated with ontologies
• User queries for Web pages analysed at
knowledge level and answered by inferencing on
ontological knowledge
Ontology Engineering
Definition of Ontology Learning
 The application of a set of methods and
techniques used for building an ontology from
 Uses distributed and heterogeneous
knowledge and information sources
 Allows a reduction in the time and effort
needed in the ontology development process
Task: automatic ontology
extraction from domain texts
Ontology Learning (Construction)
 Manual construction
• Corpus is not necessary
• Small scale
 Automatic or semiautomatic construction
• Domain specific corpus
• Good domain knowledge coverage
Ontology Learning methods from…
 Unstructured sources
• Involves NLP techniques, morphological and syntactic
analysis, etc.
 Semi-structured source
• elicit an ontology from sources that have some predefined
structure, such as XML Schema
 Structured data
• Extracting concepts and relations from knowledge contained
in structured data, such as databases
Ontology Learning ‘Layer Cake’
Axioms & Rules
Taxonomy (Concept hierarchies)
Termsdisease, illness, hospital
{disease, illness}
Disease:=<I, E, L>
is_a (Doctor, Person)
cure (domain:Doctor, range:Disease)
x, y (sufferFrom(x, y)  ill(x))
An overview of the outputs, tasks, and
common techniques for ontology learning
Subtasks in ontology learning
 Extract the relevant domain terminology and synonyms from a
text collection
 Discover concepts which can be regarded as abstractions of
human thought
 Derive a concept hierarchy organizing these concepts
 Extend an existing concept hierarchy with new concepts
 Learn non-taxonomic relations between concepts
 Populate the ontology with instances of relations and concepts
 Discover other axiomatic relationships or rules involving
concepts and relations
Sample (partial) Ontology –
Electronic Voting Domain
 Concepts: person, voter, worker, poll watcher,
location, county, precinct, vote, ballot, machine,
voting machine, manufacturer, etc.
 Attributes: name of person, model of machine, etc.
 Taxonomical relations:
• Voter is a person; precinct is a location; voting
machine is a machine, etc.
 Non-hierarchical relations:
• Voter cast ballot; voter trust machine; county
adopt machine; equipment miscount ballot, etc.
Sample (partial) Ontology –
Electronic Voting Domain
ConceptNet — a practical commonsense reasoning
 Open Mind Common Sense (OMCS) is an artificial intelligence
project based at the Massachusetts Institute of Technology (MIT)
Media Lab whose goal is to build and utilize a large
commonsense knowledge base from the contributions of many
thousands of people across the Web.
 ConceptNet is a multilingual knowledge base, representing
words and phrases that people use and the common-sense
relationships between them.
 Since its founding in 1999, it has
accumulated more than a million
English facts from over 15,000
contributors in addition to knowledge
bases in other languages.
ConceptNet — a practical commonsense reasoning
ConceptNet — a practical commonsense reasoning
 The knowledge base is a semantic network presently consisting
of over 1.6 million assertions of commonsense knowledge
encompassing the spatial, physical, social, temporal, and
psychological aspects of everyday life.
 It is built from nodes representing concepts, in the form of words
or short phrases of natural language, and labeled relationships
between them. These are the kinds of things computers need to
know to search for information better, answer questions, and
understand people's goals.
 ConceptNet is generated automatically from the 700 000
sentences of the Open Mind Common Sense Project — a World
Wide Web based collaboration with over 14 000 authors.
ConceptNet — a practical commonsense reasoning
Challenges in Text Processing
 Unstructured texts
 Ambiguity in English text
• Multiple senses of a word
• Multiple parts of speech – e.g., “like” can occur in 8 PoS:
• Verb: “Fruit flies like banana”
• Noun: “We may not see its like again”
• Adjective: “People of like tastes agree”
• Adverb: “The rate is more like 12 percent”
• Preposition: “Time flies like an arrow”
• etc
 Lack of closed domain of lexical categories
 Noisy texts
 Requirement of very large training text sets
 Lack of standards in text processing
Part 1  Terms Extraction
Axioms & Rules
Taxonomy (Concept hierarchies)
Termsdisease, illness, hospital
 Linguistic realizations of domain-specific concepts
 Are the basis of the ontology learning process
 Term extraction implies:
• Linguistic processing  part-of-speech tagging,
morphological analysis, etc.
• Statistical processing  compares the distribution of
terms between corpora
Terms Extraction: Process
 Run a Part-Of-Speech (POS) tagger over the domain
 Identify possible terms by constructing patterns, such
as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…
 Ignore Names
 Identify only the relevant to the text terms by applying
statistical metrics
Linguistic Analysis: an example
Dependency Structure
Dependency Structure
Phrase Recognition
Morphological Analysis (stemming)
Part of Speech & Semantic Tagging
Tokenization (incl. Named-Entity Rec.)[table] [2005-06-01] [John Smith]
[[the] [large] [table] NP] [[in] [the] [corner] PP]
[table N:ARTIFACT] [table N:furniture]
[work~ing V]
[[the SPEC] [large MOD] [table HEAD] NP]
[[He SUBJ] [booked PRED] [[this] [table HEAD] NP:DOBJ]S]
[[He SUBJ] [booked PRED] [[this] [table HEAD]NP:DOBJ:X1]…]…
[[It SUBJ:X1] [was PRED] still available…]
Statistical Analysis
Statistical metrics used in terms extraction:
2 ( exp)
 Chi-square
Term weighting (TFIDF) ( ) log( )
( )
tfidf w tf
df w
 
Mutual Information ( , )
( , )
( ) ( )
P x y
mi x y
P x P y
( ) ( ) log( )
( )
tfidf w tf w
df w
tf(w) term frequency (number of words occurrences in a document)
df(w) document frequency (number of documents containing the word
N number of all documents
tfidf(w) relative importance of the word in the document
Most popular weighting schema
The word is more popular when it appears
several times in a document The word is more important if it appears
in less documents
Part 2  Synonyms
Axioms & Rules
Taxonomy (Concept hierarchies)
{disease, illness}
 Identification of terms that share
semantics, i.e., potentially refer to the
same concept
 Methods for extracting synonyms
• Based on WordNet
• Latent Semantic Indexing (LSI)
 A lexical database for the English language
 Nouns, verbs, adjectives & adverbs are grouped into sets of
synonyms (synsets)
 Synsets are interlinked by means of conceptual-semantic
and lexical relations
 A lexical database for the English language
 Nouns, verbs, adjectives & adverbs are grouped into sets of
synonyms (synsets)
 Synsets are interlinked by means of conceptual-semantic
and lexical relations
Adapting WordNet to specific domain
 Partition the set of synonymy relations defined in WordNet in
three classes:
• Relations irrelevant in the specific domain
• Relations that are relevant but incorrect in the specific
• Relations that are relevant and correct in the specific
 Remove relations from the first two classes and include
relations from the third class
 Rank the rest sets according to their frequency in corpus
Latent Semantic Indexing (LSI)
 LSI is a technique in NLP of analyzing relationships
between a set of documents and the terms they contain
 Uses a term-document matrix which describes the
occurrences of terms in documents – Vector Space Model
Example: doc1 doc2
database X
computer X X
access X
Part 3  Concepts
Axioms & Rules
Taxonomy (Concept hierarchies)
Disease:=<I, E, L>
Intension, Extension, Lexicon
A term may be indicate a concept if we can define its:
Lexical realizations:
(in)formal definition of the set of objects that this concept
a set of objects that the definition of this concept
describes (the name of the nearest common ancestor)
the term itself and its multilingual synonyms
Example: a disease is an impairment of health or a condition of abnormal functioning
Example: influenza, cancer, heart disease
Example: disease, illness, maladie
Part 4  Taxonomy Induction
Axioms & Rules
Taxonomy (Concept hierarchies)
is_a (Doctor, Person)
Concept Hierarchy Extraction
 With the use of WordNet
 Lexico-syntactic patterns
 Machine Readable Dictionaries
 Co-occurrence Analysis
 Unsupervised hierarchical clustering techniques
 Linguistic-approaches
Basic methods used for taxonomy extraction:
Taxonomy Extraction with WordNet
 Given two terms t1 and t2, check if they stand in a
hypernym relation with regard to WordNet
 Normalize the number of hypernym paths by dividing
by the number of senses of t1
1 2
1 2
| ( ( ), ( )) |
( , ) min( ,1)
| ( ) |
paths senses t senses t
isa t t
senses t
path: a sequence of edges connecting the two synsets
Example: - 4 different hypernym paths between synsets ‘country’ and ‘region’
- ‘country’ has 5 senses
value of isa (country, region) = 0.8
Lexico-syntactic patterns - Hearst
 Aim: the acquisition of hyponym lexical relations from text
 Uses a set of predefined lexico-syntactic patterns which
• occur frequently and in many text genres
• indicate the relation of interest
• can be recognized with little or no pre-encoded knowledge
 Principle idea: match these patterns in texts to retrieve
is_a relations
 Precision with respect to WordNet: 55,45%
Lexico-syntactic patterns - Hearst
NPo such as {NP1, NP2,…, (and | or)} NPn
‘Vehicles such as cars, trucks and bikes….’
such NP as {NP,} * { (or | and) } NP
‘Such fruits as oranges, nectarines or apples…’
NP {, NP} * { , } { or | and } other NP
‘Swimming, running, or/and other activities…’
is-a is-a
is-a is-a
NP { , } including {NP, } * { or | and } NP
‘Injuries, including broken bones, wounds and bruises…’
NP { , } especially {NP, } * { or | and } NP
‘Publications, especially papers and books…’ publication
Lexico-syntactic patterns - Hearst
broken bone
is-a is-a
Machine Readable Dictionaries
 A method for extracting taxonomies which goes back
to the 80’s
 Main idea: exploit the regularity of dictionary entries to
find a suitable hypernym for the defined word
spring “the season between winter and summer and in which
leaves and flowers appear”
is_a (spring, season)
MRDs: Exceptions
 The hypernym can be preceded by an expression such as ‘a kind of’,
‘a sort of’, or ‘a type of’
 The problem is solved by keeping an exception list with words such as
‘kind’, ‘sort’, ‘type‘ and taking the head of the NP following the
preposition ‘of’
 The word can be defined in terms of a part-of or membership relation
republican : “a member of a political party advocating republicanism”Example:
is_a (republican, political party) part_of (republican, political party)
hornbeam: “a type of tree with a hard wood, sometimes used in hedges”Example:
is_a (hornbeam, tree)
Co-occurrence analysis
 A certain term t1 is more special that a term t2, if
t2 also appears in all the documents in which t1
( , )
( | )
( )
n x y
P x y
n y
Term x subsumes term y iff P(x | y) 1, where
n(x,y)  the number of documents in which x and y co-occur
n(y)  the number of documents that contain y
Document-based subsumption
Unsupervised hierarchical
clustering techniques
 Unsupervised hierarchical clustering techniques
known from machine learning research
• very noisy as they highly depend on the frequency and
behavior of the terms in the text collection under consideration
• learn concepts at the same time since they also group terms
(the most related to each other)
• can be regarded as abstractions over words and thus, to
some extent, as concepts
 It is unclear which specific relation actually holds
between the involved words.
Semantic_relatedness (cut, knife)
Linguistic Approaches
 Modifiers typically restrict or narrow down the meaning
of the modified noun.
 Syntactic structure analysis and dependency analysis
words and modifiers in syntactic structures (noun/verb/
prepositional/… phrases) are analyzed to discover
potential terms and relations e.g. the head-modifier principle:
the heads of the terms assuming the hypernym role
 In dependency analysis, grammatical relations, such as
subject, object, adjunct, and complement, are used for
determining more complex relations
is_a (international credit card, credit card)
Extending Concept Hierarchy
with new Concepts
…by adding a new concept at an appropriate position in the existing taxonomy
 Supervised methods:
• classifiers need to be trained which predict membership for every
concept in the existing concept hierarchy.
• need a considerable amount of training data for each concept,
• such approaches do typically not scale to arbitrary large ontologies.
 Unsupervised approaches:
• assume a similarity function which computes a measure of fit between
the new concept and the concepts existing in the ontology.
• rely on an appropriate contextual representation of the different
concepts on the basis of which similarity can be computed.
• the hierarchical structure of the ontology needs to be considered and
somehow integrated into the similarity measure
Part 5  Relations (non-taxonomic)
Axioms & Rules
Taxonomy (Concept hierarchies)
cure (domain:Doctor, range:Disease)
Extracting relations (the interactions
between concepts) & attributes
 Specific relations
• Part-of
• Qualia (Formal, Constitutive, Telic, Agentive)
 General relations
• Exploiting linguistic structure
 Attributes
Learning attributes: Introduction
 Attributes  relations with a datatype as range
 Typically expressed in texts using preposition of, the verb have or
genitive constructs, e.g. ‘the color of the car’, ‘the car’s color’, ‘every
car has a color’
 Values of attributes are expressed using copula constructs,
adjectives or expressions specific to the attribute in question, e.g.,
• ‘the car is red’ (copula + value)
• ‘the red car’ (adjective)
• ‘the baby weights 3 kgr’ (specific expressions)
Classification of attributes
To systematize the learning process attributes are classified according to their range
An approach to learning attributes
 Tokenize & part-of-speech tag the corpus
 Apply the following patterns to extract adjective/noun pairs
(w+{DET})? (w+{NN}) + is{VBZ} w + {JJ}
(w+{DET})? w + {JJ} (w+{NN}) +
 These pairs are weighted using conditional probability:
 For each of the adjectives we look up the corresponding
attributes in WordNet
f(n,a): joint frequency of adjective a and noun n
f(n): the frequency of noun n
JJ: adjective DET: determiner
NN: noun VBZ: verb, 3rd person singular present
“meronymy” / “part-of” relations
whole NN[-PL] ‘s POS part NN[-PL]
part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN
Format  type_of_word TAG type_of_word TAG…
NN = Noun NN-PL = Plural Noun
PREP = Preposition POS = Possessive
JJ = Adjective
e.g. …building’s basement…
e.g. …basement of a building… 55% accuracy
Given a “seed” word find parts of that word in a large corpus of text
Qualia structures
The meaning of a lexical element is described in terms of four roles:
physical properties of a object (e.g., weight, material, parts)
typically a verb denoting an action which brings the object in existence
normally consists in typing information about the object (e.g., hypernym)
the purpose or function of an object either by a verb or by a nominal
Formal: artifact_tool
Constitutive: blade, handle,…
Telic: cut_act
Agentive: make_act
Qualia structures for knife
Qualia Structures: Learning Approach
 aim: to automatically learn qualia
structures from the WWW
 Based on the idea of matching certain
lexico-syntactic patterns conveying a
standard relation
 Clues: search engine queries
indicating the relation of
 Calculate the weight of a
candidate qualia element e for
the term t using Jaccard
Qualia Structures: Learning Process
Generate Clues
Download Google
Matching regular
Statistical Weighting
Weighted QS
( )
( ) ( ) ( )
GoogleHits e t
GoogleHits e GoogleHits t GoogleHits e t
  
Qualia Structure: Patterns (1/2)
Formal Role
Telic Role
Qualia Structure: Patterns (2/2)
Constitutive Role
Relations by syntactic analysis
Maps a subject to the domain, the predicate or verb to a slot or
relation and the object to its range.
‘The player kicked the ball to the net’
relation: kick (domain: player, range: ball)
Relations by linguistic theory
Example: ‘Joe wrote a letter’
relation: write (subject: Joe, object: letter)
 The subcategorization frame of a word is the number
and kinds of other words that it selects when appearing
in a sentence.
 E.g. identify verbs in text as indicators of a relation
between their arguments (object properties)
Person restrictions of selection
(for the subject and object of the verb “write”)
Part 6  Axioms & Rules
Axioms & Rules
Taxonomy (Concept hierarchies)
x, y (sufferFrom(x, y)  ill(x)
Discovery of Inference Rules from Text
 an unsupervised method for discovering inference rules
from text, such as
X is author of Y  X wrote Y,
X caused Y  Y is blamed on X
X manufactures Y  X’s Y factory
 Is based on the assumption that:
Words that occurred in the same contexts tend to be similar
Distributional Hypothesis
DIRT: Distributional Hypothesis
 Distributional Hypothesis is applied to
dependency tress
 If two paths tend to link the same sets of
words, their meanings are hypothesized to be
DIRT: Dependency trees
 The inference rules
discovered by DIRT are
between paths in
dependency trees
 Are generated by Minipar
 Minipar represents its
grammar as a network where
nodes represent grammatical
categories and links syntactic
relationships A subset of the dependency relations in Minipar output
DIRT: Dependency trees
“John found a solution to the problem”
subj obj
Links represent dependency relationships
Direction: from the head to the modifier
Labels represent types of dependency relations
Each link between two words represents a direct
semantic relationship
Path between “John” and “problem”
N:subj:V  find  V:obj:N  solution  N:to:N
meaning “X finds solution to Y”
DIRT: Paths in Dependency Trees
Connect the prepositional complement directly to the words
modified by the preposition
transformation rule
Each link between two words represent a direct semantic relationship
A path represents indirect semantic relationships between two content words
Evaluation Ontology Learning Techniques
1) Task-based evaluation (improve quality): the first
approach evaluates the adequacy of ontologies in the
context of other applications.
2) Corpus-based evaluation : the second approach uses
domain-specific data sources to determine to what
extent the ontologies are able to cover the
corresponding domain.
3) Criteria-based evaluation : The third approach,
assesses ontologies by determining how well they
adhere to a set of criteria.
Task-based evaluation
 How well an ontology meets their systems’
 An ontology designed to improve the performance of
document retrieval  more relevant when the ontology
is used
 the use of ontological relations in the context of speech
recognition  compared with a gold standard
generated by humans.)
Corpus-based evaluation
 methods for evaluating the ‘fit’ between an ontology and
the domain knowledge in the form of text corpora.
 In this approach, natural language processing (e.g.,
latent semantic analysis, clustering) or information
extraction (e.g., named-entity recognition) techniques
are used to analyze the content of the corpus and
identify terms.
Criteria-based evaluation
 the average number of terms that were aggregated to
form a concept in an ontology : This criterion may be used to
realize the perception that the more variants of a term used to form
a concept, the more fully encompassing or complete the concept is.
Other measurement
 Evaluation approaches can also be distinguished by the
layers of an ontology :
• term,
• concept,
• relation
 Evaluations can be performed to assess the :
• correctness at the terminology layer,
• coverage at the conceptual layer,
• wellness at the taxonomy layer,
• adequacy of the non-taxonomic relations.
Ontology Learning Tools
 Text2Onto
• Open source (Java)
 OntoLT
• Open source (Protégé plug-in, Java)
 OntoGen
• Open source (C++, .NET)
Text2Onto: Main Features
 Learn primitives independent of a specific KR
language (Probabilistic Ontology Model, POM)
 System calculates a confidence for each learned
object for better user interaction
 Updates the learned knowledge each time the
corpus is changed and avoid processing it by scratch
 Allows for easy
• combination of algorithms,
• execution of algorithms,
• writing new algorithms
Text2Onto: Algorithms used
 Concepts
• Statistical measures, e.g. TFIDF, C-value/NC-value,…
 Subclass_of relations
• Exploits hypernym relations from WordNet
• Hearst patterns
 Mereological relations (part-of)
 General relations: extracts the following syntactic frames:
• Transitive, e.g., love(subj, obj)
• Intransitive + PP-complement, e.g., walk(subj, pp(to))
• Transitive + PP-complement, e.g., hit(subj, obj, pp(with))
 Instance-of
 Equivalence
Text2Onto: screenshot
Text2Onto: screenshot
OntoGen : Techniques used
 Linear Dimensionality Reduction (a.k.a LSI)
• words related to the same topic co-occur together
more often than words related to different topics
• Result: clusters of words each describing one topic
 K-means clustering algorithm
• Partitions the corpus into k clusters so that two
documents within the same cluster are more closely
related than two documents from different clusters
OntoGen: screenshot
 A Protégé plug-in with which classes and
relations can be extracted from a linguistic
annotated text collection
 Provides mapping rules that allow for a
mapping between linguistic entities and
class/slots candidates in Protégé
Onto-LT: Mapping rules
Maps a head-noun to a class and in combination with its modifier(s)
to one or more sub-class(es)
Maps a linguistic subject to a class, its predicate to a corresponding
slot for this class and the direct object to the “range” of the slot
Onto-LT: System architecture
Onto-LT: screenshot
Framework dimensions, sub-dimensions
and values (Onto. Tools Evaluation)
The Comparison of Ontology Tools
A Summary of the Outputs Supported, Techniques Used, and
Evaluations Performed for the Seven Systems Included
 A detailed methodology that guides the ontology
learning process does not exist
 Only general guidelines are provided
 No complete correspondence between the methods
and the tools
 Methods are based mainly on NLP techniques
complemented with statistical measures
 Tools give only support to perform some of the steps
proposed in different approaches (except Text2Onto)
Thanks For Your Attention

More Related Content

What's hot

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...Neo4j
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mappingbutest
Ontology and its various aspects
Ontology and its various aspectsOntology and its various aspects
Ontology and its various aspectssamhati27
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic webR A Akerkar
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
Ontologies in eHealth
Ontologies in eHealthOntologies in eHealth
Ontologies in eHealthIiro Jantunen
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxNeo4j
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE

What's hot (20)

The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
Examples of Ontology Applications
Examples of Ontology ApplicationsExamples of Ontology Applications
Examples of Ontology Applications
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
Ontology and its various aspects
Ontology and its various aspectsOntology and its various aspects
Ontology and its various aspects
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
Ontologies in eHealth
Ontologies in eHealthOntologies in eHealth
Ontologies in eHealth
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area

Similar to Ontology learning

Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based ReporterStefan Prutianu
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic MiningSanthosh Kumar
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
M1. sem web & ontology introd
M1. sem web & ontology introdM1. sem web & ontology introd
M1. sem web & ontology introdMichele Missikoff
Introduction to development of lexical databases
Introduction to development of lexical databasesIntroduction to development of lexical databases
Introduction to development of lexical databasesMuhammad Shoaib Chaudhary
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingConstantin Orasan
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approchanil maurya
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi

Similar to Ontology learning (20)

Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
M1. sem web & ontology introd
M1. sem web & ontology introdM1. sem web & ontology introd
M1. sem web & ontology introd
Introduction to development of lexical databases
Introduction to development of lexical databasesIntroduction to development of lexical databases
Introduction to development of lexical databases
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Ontology learning

  • 2. Definition of Ontology ‘A formal, explicit specification of a shared conceptualization’ must be machine understandable types of concepts and constraints must be clearly defined not private to some individual, but accepted by a group an abstract model of some phenomenon in the world formed by identifying the relevant concepts of that phenomenon or simply, a data model describing of a domain.
  • 3. Main elements of an ontology Hierarchy of concepts (is-a relations) Object property (relation) domain range domain xsd:string range datatype property (attribute) hasTitle wasWrittenBy
  • 4. The spectrum of ontology kinds.
  • 5. Applications of Ontologies  Knowledge representation and knowledge management systems  Intelligent query-answering systems  Information retrieval and extraction  Semantic Web • Web pages annotated with ontologies • User queries for Web pages analysed at knowledge level and answered by inferencing on ontological knowledge
  • 7. Definition of Ontology Learning  The application of a set of methods and techniques used for building an ontology from scratch  Uses distributed and heterogeneous knowledge and information sources  Allows a reduction in the time and effort needed in the ontology development process
  • 8. Task: automatic ontology extraction from domain texts Ontology extraction texts ontology
  • 9. Ontology Learning (Construction)  Manual construction • Corpus is not necessary • Small scale  Automatic or semiautomatic construction • Domain specific corpus • Good domain knowledge coverage
  • 10. Ontology Learning methods from…  Unstructured sources • Involves NLP techniques, morphological and syntactic analysis, etc.  Semi-structured source • elicit an ontology from sources that have some predefined structure, such as XML Schema  Structured data • Extracting concepts and relations from knowledge contained in structured data, such as databases
  • 11. Ontology Learning ‘Layer Cake’ Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Termsdisease, illness, hospital {disease, illness} Disease:=<I, E, L> is_a (Doctor, Person) cure (domain:Doctor, range:Disease) x, y (sufferFrom(x, y)  ill(x))
  • 12. An overview of the outputs, tasks, and common techniques for ontology learning
  • 13. Subtasks in ontology learning  Extract the relevant domain terminology and synonyms from a text collection  Discover concepts which can be regarded as abstractions of human thought  Derive a concept hierarchy organizing these concepts  Extend an existing concept hierarchy with new concepts  Learn non-taxonomic relations between concepts  Populate the ontology with instances of relations and concepts  Discover other axiomatic relationships or rules involving concepts and relations
  • 14.
  • 15. Sample (partial) Ontology – Electronic Voting Domain  Concepts: person, voter, worker, poll watcher, location, county, precinct, vote, ballot, machine, voting machine, manufacturer, etc.  Attributes: name of person, model of machine, etc.  Taxonomical relations: • Voter is a person; precinct is a location; voting machine is a machine, etc.  Non-hierarchical relations: • Voter cast ballot; voter trust machine; county adopt machine; equipment miscount ballot, etc.
  • 16. Sample (partial) Ontology – Electronic Voting Domain
  • 17. ConceptNet — a practical commonsense reasoning  Open Mind Common Sense (OMCS) is an artificial intelligence project based at the Massachusetts Institute of Technology (MIT) Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.  ConceptNet is a multilingual knowledge base, representing words and phrases that people use and the common-sense relationships between them.  Since its founding in 1999, it has accumulated more than a million English facts from over 15,000 contributors in addition to knowledge bases in other languages.
  • 18. ConceptNet — a practical commonsense reasoning
  • 19. ConceptNet — a practical commonsense reasoning  The knowledge base is a semantic network presently consisting of over 1.6 million assertions of commonsense knowledge encompassing the spatial, physical, social, temporal, and psychological aspects of everyday life.  It is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people's goals.  ConceptNet is generated automatically from the 700 000 sentences of the Open Mind Common Sense Project — a World Wide Web based collaboration with over 14 000 authors.
  • 20. ConceptNet — a practical commonsense reasoning
  • 21. Challenges in Text Processing  Unstructured texts  Ambiguity in English text • Multiple senses of a word • Multiple parts of speech – e.g., “like” can occur in 8 PoS: • Verb: “Fruit flies like banana” • Noun: “We may not see its like again” • Adjective: “People of like tastes agree” • Adverb: “The rate is more like 12 percent” • Preposition: “Time flies like an arrow” • etc  Lack of closed domain of lexical categories  Noisy texts  Requirement of very large training text sets  Lack of standards in text processing
  • 22. Part 1  Terms Extraction Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Termsdisease, illness, hospital
  • 23. Terms  Linguistic realizations of domain-specific concepts  Are the basis of the ontology learning process  Term extraction implies: • Linguistic processing  part-of-speech tagging, morphological analysis, etc. • Statistical processing  compares the distribution of terms between corpora
  • 24. Terms Extraction: Process  Run a Part-Of-Speech (POS) tagger over the domain corpus  Identify possible terms by constructing patterns, such as: Adj-Noun, Noun-noun, Adj-Noun-Noun,…  Ignore Names  Identify only the relevant to the text terms by applying statistical metrics
  • 25. Linguistic Analysis: an example Discourse Analysis Dependency Structure (S) Dependency Structure (Phrases) Phrase Recognition Morphological Analysis (stemming) Part of Speech & Semantic Tagging Tokenization (incl. Named-Entity Rec.)[table] [2005-06-01] [John Smith] [[the] [large] [table] NP] [[in] [the] [corner] PP] [table N:ARTIFACT] [table N:furniture] [work~ing V] [[the SPEC] [large MOD] [table HEAD] NP] [[He SUBJ] [booked PRED] [[this] [table HEAD] NP:DOBJ]S] [[He SUBJ] [booked PRED] [[this] [table HEAD]NP:DOBJ:X1]…]… [[It SUBJ:X1] [was PRED] still available…]
  • 26. Statistical Analysis Statistical metrics used in terms extraction: 2 ( exp) exp obs    Chi-square Term weighting (TFIDF) ( ) log( ) ( ) N tfidf w tf df w   Mutual Information ( , ) ( , ) ( ) ( ) P x y mi x y P x P y 
  • 27. TFIDF ( ) ( ) log( ) ( ) N tfidf w tf w df w  tf(w) term frequency (number of words occurrences in a document) df(w) document frequency (number of documents containing the word N number of all documents tfidf(w) relative importance of the word in the document Most popular weighting schema The word is more popular when it appears several times in a document The word is more important if it appears in less documents
  • 28. Part 2  Synonyms Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Terms {disease, illness}
  • 29. Synonyms  Identification of terms that share semantics, i.e., potentially refer to the same concept  Methods for extracting synonyms • Based on WordNet • Latent Semantic Indexing (LSI)
  • 30. WordNet  A lexical database for the English language  Nouns, verbs, adjectives & adverbs are grouped into sets of synonyms (synsets)  Synsets are interlinked by means of conceptual-semantic and lexical relations
  • 31. WordNet  A lexical database for the English language  Nouns, verbs, adjectives & adverbs are grouped into sets of synonyms (synsets)  Synsets are interlinked by means of conceptual-semantic and lexical relations
  • 32. Adapting WordNet to specific domain  Partition the set of synonymy relations defined in WordNet in three classes: • Relations irrelevant in the specific domain • Relations that are relevant but incorrect in the specific domain • Relations that are relevant and correct in the specific domain  Remove relations from the first two classes and include relations from the third class  Rank the rest sets according to their frequency in corpus
  • 33. Latent Semantic Indexing (LSI)  LSI is a technique in NLP of analyzing relationships between a set of documents and the terms they contain  Uses a term-document matrix which describes the occurrences of terms in documents – Vector Space Model Example: doc1 doc2 database X computer X X access X
  • 34. Part 3  Concepts Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Terms Disease:=<I, E, L>
  • 35. Concepts Intension, Extension, Lexicon A term may be indicate a concept if we can define its: Intension: Extension: Lexical realizations: (in)formal definition of the set of objects that this concept describes a set of objects that the definition of this concept describes (the name of the nearest common ancestor) the term itself and its multilingual synonyms Example: a disease is an impairment of health or a condition of abnormal functioning Example: influenza, cancer, heart disease Example: disease, illness, maladie
  • 36. Part 4  Taxonomy Induction Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Terms is_a (Doctor, Person)
  • 37. Concept Hierarchy Extraction  With the use of WordNet  Lexico-syntactic patterns  Machine Readable Dictionaries  Co-occurrence Analysis  Unsupervised hierarchical clustering techniques  Linguistic-approaches Basic methods used for taxonomy extraction:
  • 38. Taxonomy Extraction with WordNet  Given two terms t1 and t2, check if they stand in a hypernym relation with regard to WordNet  Normalize the number of hypernym paths by dividing by the number of senses of t1 1 2 1 2 1 | ( ( ), ( )) | ( , ) min( ,1) | ( ) | paths senses t senses t isa t t senses t  path: a sequence of edges connecting the two synsets Example: - 4 different hypernym paths between synsets ‘country’ and ‘region’ - ‘country’ has 5 senses value of isa (country, region) = 0.8
  • 39. Lexico-syntactic patterns - Hearst  Aim: the acquisition of hyponym lexical relations from text  Uses a set of predefined lexico-syntactic patterns which • occur frequently and in many text genres • indicate the relation of interest • can be recognized with little or no pre-encoded knowledge  Principle idea: match these patterns in texts to retrieve is_a relations  Precision with respect to WordNet: 55,45%
  • 40. Lexico-syntactic patterns - Hearst NPo such as {NP1, NP2,…, (and | or)} NPn ‘Vehicles such as cars, trucks and bikes….’ such NP as {NP,} * { (or | and) } NP ‘Such fruits as oranges, nectarines or apples…’ NP {, NP} * { , } { or | and } other NP ‘Swimming, running, or/and other activities…’ vehicle car bike truck is-a is-a is-a fruit apple nectarine orange is-a is-a is-a is-a activity swimmingrunning is-a
  • 41. NP { , } including {NP, } * { or | and } NP ‘Injuries, including broken bones, wounds and bruises…’ NP { , } especially {NP, } * { or | and } NP ‘Publications, especially papers and books…’ publication bookpaper is-ais-a Lexico-syntactic patterns - Hearst injury bruise wound broken bone is-a is-a is-a
  • 42. Machine Readable Dictionaries  A method for extracting taxonomies which goes back to the 80’s  Main idea: exploit the regularity of dictionary entries to find a suitable hypernym for the defined word spring “the season between winter and summer and in which leaves and flowers appear” Example: is_a (spring, season)
  • 43. MRDs: Exceptions  The hypernym can be preceded by an expression such as ‘a kind of’, ‘a sort of’, or ‘a type of’  The problem is solved by keeping an exception list with words such as ‘kind’, ‘sort’, ‘type‘ and taking the head of the NP following the preposition ‘of’  The word can be defined in terms of a part-of or membership relation republican : “a member of a political party advocating republicanism”Example: is_a (republican, political party) part_of (republican, political party) hornbeam: “a type of tree with a hard wood, sometimes used in hedges”Example: is_a (hornbeam, tree)
  • 44. Co-occurrence analysis  A certain term t1 is more special that a term t2, if t2 also appears in all the documents in which t1 appears. ( , ) ( | ) ( ) n x y P x y n y  Term x subsumes term y iff P(x | y) 1, where n(x,y)  the number of documents in which x and y co-occur n(y)  the number of documents that contain y Document-based subsumption
  • 45. Unsupervised hierarchical clustering techniques  Unsupervised hierarchical clustering techniques known from machine learning research • very noisy as they highly depend on the frequency and behavior of the terms in the text collection under consideration • learn concepts at the same time since they also group terms (the most related to each other) • can be regarded as abstractions over words and thus, to some extent, as concepts  It is unclear which specific relation actually holds between the involved words. Semantic_relatedness (cut, knife) Example:
  • 46. Linguistic Approaches  Modifiers typically restrict or narrow down the meaning of the modified noun.  Syntactic structure analysis and dependency analysis words and modifiers in syntactic structures (noun/verb/ prepositional/… phrases) are analyzed to discover potential terms and relations e.g. the head-modifier principle: the heads of the terms assuming the hypernym role  In dependency analysis, grammatical relations, such as subject, object, adjunct, and complement, are used for determining more complex relations is_a (international credit card, credit card) Example:
  • 47. Extending Concept Hierarchy with new Concepts …by adding a new concept at an appropriate position in the existing taxonomy  Supervised methods: • classifiers need to be trained which predict membership for every concept in the existing concept hierarchy. • need a considerable amount of training data for each concept, • such approaches do typically not scale to arbitrary large ontologies.  Unsupervised approaches: • assume a similarity function which computes a measure of fit between the new concept and the concepts existing in the ontology. • rely on an appropriate contextual representation of the different concepts on the basis of which similarity can be computed. • the hierarchical structure of the ontology needs to be considered and somehow integrated into the similarity measure
  • 48. Part 5  Relations (non-taxonomic) Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Terms cure (domain:Doctor, range:Disease)
  • 49. Extracting relations (the interactions between concepts) & attributes  Specific relations • Part-of • Qualia (Formal, Constitutive, Telic, Agentive)  General relations • Exploiting linguistic structure  Attributes
  • 50. Learning attributes: Introduction  Attributes  relations with a datatype as range  Typically expressed in texts using preposition of, the verb have or genitive constructs, e.g. ‘the color of the car’, ‘the car’s color’, ‘every car has a color’  Values of attributes are expressed using copula constructs, adjectives or expressions specific to the attribute in question, e.g., • ‘the car is red’ (copula + value) • ‘the red car’ (adjective) • ‘the baby weights 3 kgr’ (specific expressions)
  • 51. Classification of attributes To systematize the learning process attributes are classified according to their range
  • 52. An approach to learning attributes  Tokenize & part-of-speech tag the corpus  Apply the following patterns to extract adjective/noun pairs (w+{DET})? (w+{NN}) + is{VBZ} w + {JJ} (w+{DET})? w + {JJ} (w+{NN}) +  These pairs are weighted using conditional probability:  For each of the adjectives we look up the corresponding attributes in WordNet f(n,a): joint frequency of adjective a and noun n f(n): the frequency of noun n JJ: adjective DET: determiner NN: noun VBZ: verb, 3rd person singular present
  • 53. “meronymy” / “part-of” relations whole NN[-PL] ‘s POS part NN[-PL] part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN Format  type_of_word TAG type_of_word TAG… NN = Noun NN-PL = Plural Noun PREP = Preposition POS = Possessive JJ = Adjective e.g. …building’s basement… e.g. …basement of a building… 55% accuracy Given a “seed” word find parts of that word in a large corpus of text
  • 54. Qualia structures The meaning of a lexical element is described in terms of four roles: Constitutive Agentive Formal Telic physical properties of a object (e.g., weight, material, parts) typically a verb denoting an action which brings the object in existence normally consists in typing information about the object (e.g., hypernym) the purpose or function of an object either by a verb or by a nominal Formal: artifact_tool Constitutive: blade, handle,… Telic: cut_act Agentive: make_act Example: Qualia structures for knife
  • 55. Qualia Structures: Learning Approach  aim: to automatically learn qualia structures from the WWW  Based on the idea of matching certain lexico-syntactic patterns conveying a standard relation
  • 56.  Clues: search engine queries indicating the relation of interest  Calculate the weight of a candidate qualia element e for the term t using Jaccard coefficient: Qualia Structures: Learning Process Generate Clues Download Google Abstracts POS-tagging Matching regular expressions Statistical Weighting Word Weighted QS ( ) ( ) ( ) ( ) GoogleHits e t GoogleHits e GoogleHits t GoogleHits e t    
  • 57. Qualia Structure: Patterns (1/2) Formal Role Telic Role
  • 58. Qualia Structure: Patterns (2/2) Constitutive Role
  • 59. Relations by syntactic analysis SubjToClass_PredToSlot_DObjToRange Maps a subject to the domain, the predicate or verb to a slot or relation and the object to its range. Example: OntoLT ‘The player kicked the ball to the net’ relation: kick (domain: player, range: ball)
  • 60. Relations by linguistic theory Example: ‘Joe wrote a letter’ relation: write (subject: Joe, object: letter)  The subcategorization frame of a word is the number and kinds of other words that it selects when appearing in a sentence.  E.g. identify verbs in text as indicators of a relation between their arguments (object properties) Person restrictions of selection (for the subject and object of the verb “write”) written-communication
  • 61. Part 6  Axioms & Rules Axioms & Rules Relations Taxonomy (Concept hierarchies) Concepts Synonyms Terms x, y (sufferFrom(x, y)  ill(x)
  • 62. DIRT Discovery of Inference Rules from Text  an unsupervised method for discovering inference rules from text, such as X is author of Y  X wrote Y, X caused Y  Y is blamed on X X manufactures Y  X’s Y factory  Is based on the assumption that: Words that occurred in the same contexts tend to be similar Distributional Hypothesis
  • 63. DIRT: Distributional Hypothesis  Distributional Hypothesis is applied to dependency tress  If two paths tend to link the same sets of words, their meanings are hypothesized to be similar
  • 64. DIRT: Dependency trees  The inference rules discovered by DIRT are between paths in dependency trees  Are generated by Minipar parser  Minipar represents its grammar as a network where nodes represent grammatical categories and links syntactic relationships A subset of the dependency relations in Minipar output
  • 65. DIRT: Dependency trees “John found a solution to the problem” pcomp found a solution to problem the John moddet subj obj det Links represent dependency relationships Direction: from the head to the modifier Labels represent types of dependency relations Each link between two words represents a direct semantic relationship Path between “John” and “problem” N:subj:V  find  V:obj:N  solution  N:to:N meaning “X finds solution to Y”
  • 66. DIRT: Paths in Dependency Trees Connect the prepositional complement directly to the words modified by the preposition transformation rule Each link between two words represent a direct semantic relationship A path represents indirect semantic relationships between two content words
  • 67. Evaluation Ontology Learning Techniques 1) Task-based evaluation (improve quality): the first approach evaluates the adequacy of ontologies in the context of other applications. 2) Corpus-based evaluation : the second approach uses domain-specific data sources to determine to what extent the ontologies are able to cover the corresponding domain. 3) Criteria-based evaluation : The third approach, assesses ontologies by determining how well they adhere to a set of criteria.
  • 68. Task-based evaluation  How well an ontology meets their systems’ requirements.  An ontology designed to improve the performance of document retrieval  more relevant when the ontology is used  the use of ontological relations in the context of speech recognition  compared with a gold standard generated by humans.)
  • 69. Corpus-based evaluation  methods for evaluating the ‘fit’ between an ontology and the domain knowledge in the form of text corpora.  In this approach, natural language processing (e.g., latent semantic analysis, clustering) or information extraction (e.g., named-entity recognition) techniques are used to analyze the content of the corpus and identify terms.
  • 70. Criteria-based evaluation  the average number of terms that were aggregated to form a concept in an ontology : This criterion may be used to realize the perception that the more variants of a term used to form a concept, the more fully encompassing or complete the concept is.
  • 71. Other measurement  Evaluation approaches can also be distinguished by the layers of an ontology : • term, • concept, • relation  Evaluations can be performed to assess the : • correctness at the terminology layer, • coverage at the conceptual layer, • wellness at the taxonomy layer, • adequacy of the non-taxonomic relations.
  • 72. Ontology Learning Tools  Text2Onto • Open source (Java) •  OntoLT • Open source (Protégé plug-in, Java) •  OntoGen • Open source (C++, .NET) •
  • 73. Text2Onto: Main Features  Learn primitives independent of a specific KR language (Probabilistic Ontology Model, POM)  System calculates a confidence for each learned object for better user interaction  Updates the learned knowledge each time the corpus is changed and avoid processing it by scratch  Allows for easy • combination of algorithms, • execution of algorithms, • writing new algorithms
  • 74. Text2Onto: Algorithms used  Concepts • Statistical measures, e.g. TFIDF, C-value/NC-value,…  Subclass_of relations • Exploits hypernym relations from WordNet • Hearst patterns  Mereological relations (part-of)  General relations: extracts the following syntactic frames: • Transitive, e.g., love(subj, obj) • Intransitive + PP-complement, e.g., walk(subj, pp(to)) • Transitive + PP-complement, e.g., hit(subj, obj, pp(with))  Instance-of  Equivalence
  • 77. OntoGen : Techniques used  Linear Dimensionality Reduction (a.k.a LSI) • words related to the same topic co-occur together more often than words related to different topics • Result: clusters of words each describing one topic  K-means clustering algorithm • Partitions the corpus into k clusters so that two documents within the same cluster are more closely related than two documents from different clusters
  • 79. Onto-LT  A Protégé plug-in with which classes and relations can be extracted from a linguistic annotated text collection  Provides mapping rules that allow for a mapping between linguistic entities and class/slots candidates in Protégé
  • 80. Onto-LT: Mapping rules HeadNounToClass_ModToSubClass Maps a head-noun to a class and in combination with its modifier(s) to one or more sub-class(es) Maps a linguistic subject to a class, its predicate to a corresponding slot for this class and the direct object to the “range” of the slot SubjToClass_PredToSlot_DObjToRange
  • 83.
  • 84. Framework dimensions, sub-dimensions and values (Onto. Tools Evaluation)
  • 85. The Comparison of Ontology Tools
  • 86. A Summary of the Outputs Supported, Techniques Used, and Evaluations Performed for the Seven Systems Included
  • 87. Conclusions  A detailed methodology that guides the ontology learning process does not exist  Only general guidelines are provided  No complete correspondence between the methods and the tools  Methods are based mainly on NLP techniques complemented with statistical measures  Tools give only support to perform some of the steps proposed in different approaches (except Text2Onto)
  • 88. 92 Thanks For Your Attention