SlideShare a Scribd company logo
Introduction to
Development of Lexical
Databases
Muhammad Shoaib
PhD Researcher (Biomedical Engineering)
Asan Medical Complex
College of Medicine University of Ulsan
Researcher Gachon University Gil Medical Center
Republic Of Korea
About me: Son of Soil
BS Computer Science (2006-2010)
FAST National University of Computer and Emerging
Sciences
ME Computer Engineering (2011-2013)
Jeju National University Republic of Korea
PhD Biomedical Engineering (2015 – To date)
Asan Medical Center, University of Ulsan, Republic of
Korea
Lecturer at Institute of Space Technology 2013-2015
Overview
Lexical Databases and DBMS
WordNet (we’ll see who can we adopt it)
Computational Linguistics
Lexical Ontologies
Database Management System
Data:
Facts and statistics collected together for
reference or analysis.
Database:
A structured set of data held in a computer,
especially one that is accessible in various ways.
Database Management System
computer-software application that interacts
with end-users, other applications
What we are talking about today?
Globalization requires more texts and speech
to be translated faster across more languages
Manual translation is difficult, expensive, time-
consuming
Machine translation is of low quality, often
unacceptable
Why Lexical Database
What are reading how computer’s can
understand?
Why we need computers for translations?
They are faster then humans
Can computer do the similar job as humans?
In linguistics probably not
Lexical Database
Machine Readable Dictionary
“A lexical database is a lexical resource which has an
associated software environment database which permits
access to its contents”
What is Lexical Resource?
“A lexical resource (LR) is a database consisting of one or
several dictionaries.”
What a Lexical Database Contains?
Information typically stored in a lexical
database includes
lexical category of words
synonyms of words,
semantic and phonological relations between
different words or sets of words.
Why Lexical Databases?
Natural language generation systems that produce
coherent discourses by verbalizing a set of triples
Question Answering systems that interpret user
questions with respect to one or more ontologies
Text interpretation systems that extract triples with
respect to one or more ontologies
Query interpretation and semantic search in
information retrieval systems
Natural language based interfaces to ontologies,
Semantic Web and Linked Data.
WordNet
What is WordNet?
A large lexical database, or “electronic
dictionary,” developed and maintained at
Princeton
http://wordnet.princeton.edu
Includes most English nouns, verbs, adjectives,
adverbs
Can be used by humans and machines
Princeton WordNet is for English only, but it is
linked to wordnets is many other languages
Authors of the (first) WordNet
WordNet was created in the Cognitive
Science Laboratory of Princeton University under the
direction of psychology professor George Armitage
Miller starting in 1985 and has been directed in
recent years by Christiane Fellbaum
That is why it is usually called „the Princeton WordNet“
(PWN)
George Miller and Christiane Fellbaum were awarded
the 2006 Antonio Zampolli Prize for their work with
WordNet.
WordNet as described by authors
WordNet is an on-line lexical reference system
whose design is inspired by current
psycholinguistic theories of human lexical
memory. English nouns, verbs, and adjectives
are organized into synonym sets, each
representing one underlying lexical concept.
Different relations link the synonym sets.
What’s special about WordNet?
Traditional paper dictionaries are organized
alphabetically: words that are found together (on the
same page) are not related by meaning
WordNet is organized by meaning: words in close
proximity are semantically similar
Human users and computers can browse WordNet
and find words that are meaningfully related to their
queries (somewhat like in a hyperdimensional
thesaurus)
What’s special about WordNet?
WordNet gives information about two fundamental,
universal properties of human language:
polysemy and synonymy
Polysemy = one:many mapping of form and
meaning
Synonymy = one:many mapping of meaning and
form
Polysemy
One word form expresses multiple meanings
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
Note: the most frequent word forms are the most
polysemous!
Synonymy
One concept is expressed by several different
word forms:
{beat, hit, strike}
{car, motorcar, auto, automobile}
Polysemy and synonymy
Understanding and generating language (as for
translation) means matching a word form with
the intended, context-appropriate meaning
People (fluent speakers of a language) do this
very efficiently
Synonymy in WordNet
WordNet groups (roughly) synonymous,
denotationally equivalent, words into unordered
sets of synonyms (“synsets”)
{hit, beat, strike}
{big, large}
{queue, line}
By definition, each synset expresses a distinct
meaning/concept
Each word form-meaning pair is unique
Polysemy in WordNet
A word form that appears in n synsets
is n-fold polysemous
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
table is fourfold polysemous/has four senses
four distinct concepts are associated with the word form table
Hypernymy relates noun synsets
Relates more/less general concepts
Creates hierarchies, or “trees”
{vehicle}
/ 
{car, automobile} {bicycle, bike}
/  
{convertible} {SUV} {mountain bike}
“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”
Hierarchies can have up to 16 levels
Hyponymy (Association Rules)
Transitivity
A car is a kind of vehicle
An SUV is a kind of car
=> An SUV is a kind of vehicle
Meronymy/holonymy
(part-whole relation)
{car, automobile}
|
{engine}
/ 
{spark plug} {cylinder}
“An engine has spark plugs”
“Spark plus and cylinders are parts of
an engine”
Meronymy/Holonymy (Inheritance)
A finger is part of a hand
A hand is part of an arm
An arm is part of a body
=>a finger is part of a body
Upward hierarchy in WorldNet
{entity}
{physical_entity}
{object, physical_object}
{whole, unit}
{living_thing, animate_thing}
{organism, being}
{animal, animate_being, beast, brute, creature, fauna}
{chordate}
{vertebrate, craniate}
{mammal, mammalian}
{placental, placental_mammal, eutherian, eutherian_mammal}
{carnivore}
{canine, canid}
{dog, domestic_dog, Canis_familiaris}
25 unique beginners for noun
synsets
{act, action, activity} {food} {possession}
{animal, fauna} {location, place} {process}
{artifact} {motive} {quantity, amount}
{attribute, property} {group, collection} {relation}
{body, corpus} {natural object} {shape}
{cognition, knowledge} {natural phenomenon} {state, condition}
{communication} {person, human being} {substance}
{event, happening} {plant, flora} {time}
{feeling, emotion}
Verb clusters
Verbs of Bodily Functions and
Care (sweat)
Motion Verbs (move)
Verbs of Change (change) Emotion or Psych Verbs (feel)
Verbs of Communication (tell) Stative Verbs (have, wear)
Competition Verbs (race) Perception Verbs (see)
Consumption Verbs (drink) Verbs of Possession (possess,
own)
Contact Verbs (touch) Verbs of Social Interaction
(request, impeach)
Cognition Verbs (think) Weather Verbs (thunder)
Creation Verbs (create)
Computational
Linguistics
What is Computational Lexical Semantics
Any computational process involving word
meaning!
Computing Word Similarity
Distributional (Vector) Models of Meaning
Computing Word Relations
Word Sense Disambiguation
Semantic Role Labeling
Computing word connotation and sentiment
Concrete Applications
corpus linguistics
machine translation
text retrieval
text summarization
word processing help (discussed above)
expert systems
speech recognition/synthesis (touched upon above)
toys, games
automatic telephone interpretation system
ultimately … artificial intelligence, robotics
Corpus Linguistics
This is a generic name for various computer
applications that make use of large language
databases (called corpora)
Having access to a large database enabled us
to process linguistic data in a statistical way,
rather than in an analytical way.
This conflict of two opposing views
(statistical vs. analytical) is very apparent in
machine translation.
Machine Translation (1)
text-to-text translation (great need for
translation at UN, EC (European
Community)
Works best when two languages in
question are similar in structure
Usually, pre-editing and/or post-editing by
a human translator is required — machine-
assisted translation.
Machine Translation (2)
Traditionally, MT required parsing, possibly
some semantic analysis, then mapping to a
syntactic tree of the sentence in the target
language.
An alternative is appeal to statistical means
of mapping a surface string in the source
language to a surface string in the target
language.
Difficulty with word-for-word translation
Computational Semantics
The study of how to automate the process of
constructing and reasoning with meaning
representations of natural language expressions.
This could play an important role in such application
areas as machine translation when two
typologically distinct languages are involved (e.g.
English and Japanese).
Text Retrieval
key word  text/book
key word: morphology
1. Principles of Polymer Morphology
2. Image Analysis and Mathematical Morphology
3. Drainage Basin Morphology
4. French Morphology
We need morphological, syntactic, and semantic
information to find the right text/book.
Further applications: search engines, etc.
Text Summarization
We need to be able to select the right
information from the electronic documents
available (esp. on the web).
Automatic text summarization is a
technique that can help people to quickly
grasp the concepts presented in a
document by creating an abstract or
summary of the original text.
Semantic Web
Some people (e.g. Evergreen U) are trying
to classify contents of web pages so that
they are meaningful to computers. But this
is not an easy task since the categories
must presumably be pre-selected by
people.
The semantic Web provides a common
framework that allows data to be shared
and reused across application, enterprise,
and community boundaries.
http://www.w3.org/2001/sw/
Ontology:OriginsandHistory
OntologyinPhilosophy
 A philosophical discipline - a branch of
philosophy that deals with the nature and the
organisation of reality
 Science of Being (Aristotle, Metaphysics, IV, 1)
 Tries to answer the questions:
 What characterizes being?
 Eventually, what is being?
Ontology in Computer Science
An ontology is an engineering artifact:
It is constituted by a specific vocabulary
used to describe a certain reality, plus
a set of explicit assumptions regarding
the intended meaning of the vocabulary.
Thus, an ontology describes a formal specification of a
certain domain:
Shared understanding of a domain of
interest
Formal and machine manipulable model
of a domain of interest
How to use Lexical Ontologies
1. Ontology-based Information Extraction and
Ontology Population from Text
2. Ontology-based Question Answering
3. Natural Language Generation from Triples
4. Integration and publishing of legacy language
resources
5. Representation of Translations in the Web of
Data
6. Ontology-based Machine Translation
Lexicon Model for Ontologies
Conclusion
Database Development is basic building block for Machine
Translation, Natural Language Processing and
Computational Linguistics
WorldNet is one of the richest resource and its structure can
be used to create new lexical database for our language
(Urdu/Persian/Arabic)
Ontologies can be used to add enhanced semantics to the
lexical resources beyond the limits of databases because of
their nature and capability to describer things
Thanks
and
Questions

More Related Content

What's hot

Lecture 1st-Introduction to Discourse Analysis._023928.pptx
Lecture 1st-Introduction to Discourse Analysis._023928.pptxLecture 1st-Introduction to Discourse Analysis._023928.pptx
Lecture 1st-Introduction to Discourse Analysis._023928.pptx
Google
 
Key issues in 2nd language acquisition
Key issues in 2nd language acquisitionKey issues in 2nd language acquisition
Key issues in 2nd language acquisition
Samir1370
 
semantics the study of meaning
 semantics the study of meaning semantics the study of meaning
semantics the study of meaning
Yoshinta Debbi A
 
Sapir whorf hypothesis
Sapir whorf hypothesis Sapir whorf hypothesis
Sapir whorf hypothesis
Danish Ashraf
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguistics
amoresyoh99
 
Second language acquisition and error analysis (arfan rai)
Second language acquisition and error analysis (arfan rai)Second language acquisition and error analysis (arfan rai)
Second language acquisition and error analysis (arfan rai)
Arfan rai
 
Language
LanguageLanguage
Language
Wu Heping
 
Morphology # Productivity in Word-Formation
Morphology # Productivity in Word-FormationMorphology # Productivity in Word-Formation
Morphology # Productivity in Word-Formation
Ani Istiana
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabulary
Azam Almubarki
 
Discourse
Discourse Discourse
Discourse
Eika Matari
 
Applied linguistic: Contrastive Analysis
Applied linguistic: Contrastive AnalysisApplied linguistic: Contrastive Analysis
Applied linguistic: Contrastive Analysis
Intan Meldy
 
1. introduction to semantics
1. introduction to semantics1. introduction to semantics
1. introduction to semantics
Asmaa Alzelibany
 
Language and sex in Sociolinguistic
Language and sex in SociolinguisticLanguage and sex in Sociolinguistic
Language and sex in Sociolinguistic
ernirutmana
 
Ch. 8 ethnicity and social networks
Ch. 8 ethnicity and social networksCh. 8 ethnicity and social networks
Ch. 8 ethnicity and social networks
adeyun467
 
6) discourse grammar
6) discourse grammar6) discourse grammar
6) discourse grammar
AtaMSaeed
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
VivaAs
 
Chomskyan linguistics lec 3
Chomskyan linguistics lec 3Chomskyan linguistics lec 3
Chomskyan linguistics lec 3
Hina Honey
 
Language Change - Linguistics
Language Change - Linguistics Language Change - Linguistics
Language Change - Linguistics
Deta Eka
 
Interlanguage errors
Interlanguage errorsInterlanguage errors
Interlanguage errors
Shona Whyte
 
ENGLISH SYNTAX
ENGLISH SYNTAXENGLISH SYNTAX
ENGLISH SYNTAX
Videoconferencias UTPL
 

What's hot (20)

Lecture 1st-Introduction to Discourse Analysis._023928.pptx
Lecture 1st-Introduction to Discourse Analysis._023928.pptxLecture 1st-Introduction to Discourse Analysis._023928.pptx
Lecture 1st-Introduction to Discourse Analysis._023928.pptx
 
Key issues in 2nd language acquisition
Key issues in 2nd language acquisitionKey issues in 2nd language acquisition
Key issues in 2nd language acquisition
 
semantics the study of meaning
 semantics the study of meaning semantics the study of meaning
semantics the study of meaning
 
Sapir whorf hypothesis
Sapir whorf hypothesis Sapir whorf hypothesis
Sapir whorf hypothesis
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguistics
 
Second language acquisition and error analysis (arfan rai)
Second language acquisition and error analysis (arfan rai)Second language acquisition and error analysis (arfan rai)
Second language acquisition and error analysis (arfan rai)
 
Language
LanguageLanguage
Language
 
Morphology # Productivity in Word-Formation
Morphology # Productivity in Word-FormationMorphology # Productivity in Word-Formation
Morphology # Productivity in Word-Formation
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabulary
 
Discourse
Discourse Discourse
Discourse
 
Applied linguistic: Contrastive Analysis
Applied linguistic: Contrastive AnalysisApplied linguistic: Contrastive Analysis
Applied linguistic: Contrastive Analysis
 
1. introduction to semantics
1. introduction to semantics1. introduction to semantics
1. introduction to semantics
 
Language and sex in Sociolinguistic
Language and sex in SociolinguisticLanguage and sex in Sociolinguistic
Language and sex in Sociolinguistic
 
Ch. 8 ethnicity and social networks
Ch. 8 ethnicity and social networksCh. 8 ethnicity and social networks
Ch. 8 ethnicity and social networks
 
6) discourse grammar
6) discourse grammar6) discourse grammar
6) discourse grammar
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Chomskyan linguistics lec 3
Chomskyan linguistics lec 3Chomskyan linguistics lec 3
Chomskyan linguistics lec 3
 
Language Change - Linguistics
Language Change - Linguistics Language Change - Linguistics
Language Change - Linguistics
 
Interlanguage errors
Interlanguage errorsInterlanguage errors
Interlanguage errors
 
ENGLISH SYNTAX
ENGLISH SYNTAXENGLISH SYNTAX
ENGLISH SYNTAX
 

Similar to Introduction to development of lexical databases

The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
AIMS (Agricultural Information Management Standards)
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
AdnanBaloch15
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
Linda Garcia
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
csandit
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
ijnlc
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Nlp (1)
Nlp (1)Nlp (1)
Ontology learning
Ontology learningOntology learning
Ontology learning
Ehsan Asgarian
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
kevig
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
Michel Bruley
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
guest873a50
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
Rajat Jain
 
NLP
NLPNLP
Language Grid
Language GridLanguage Grid
Language Grid
lindh
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Daniel Lewis
 
Textmining
TextminingTextmining
Textmining
sidhunileshwar
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
gerogepatton
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
ijaia
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 

Similar to Introduction to development of lexical databases (20)

The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
NLP
NLPNLP
NLP
 
Language Grid
Language GridLanguage Grid
Language Grid
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
 
Textmining
TextminingTextmining
Textmining
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 

Recently uploaded

原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 

Recently uploaded (20)

原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 

Introduction to development of lexical databases

  • 1. Introduction to Development of Lexical Databases Muhammad Shoaib PhD Researcher (Biomedical Engineering) Asan Medical Complex College of Medicine University of Ulsan Researcher Gachon University Gil Medical Center Republic Of Korea
  • 2. About me: Son of Soil BS Computer Science (2006-2010) FAST National University of Computer and Emerging Sciences ME Computer Engineering (2011-2013) Jeju National University Republic of Korea PhD Biomedical Engineering (2015 – To date) Asan Medical Center, University of Ulsan, Republic of Korea Lecturer at Institute of Space Technology 2013-2015
  • 3. Overview Lexical Databases and DBMS WordNet (we’ll see who can we adopt it) Computational Linguistics Lexical Ontologies
  • 4.
  • 5. Database Management System Data: Facts and statistics collected together for reference or analysis. Database: A structured set of data held in a computer, especially one that is accessible in various ways. Database Management System computer-software application that interacts with end-users, other applications
  • 6. What we are talking about today? Globalization requires more texts and speech to be translated faster across more languages Manual translation is difficult, expensive, time- consuming Machine translation is of low quality, often unacceptable
  • 7. Why Lexical Database What are reading how computer’s can understand? Why we need computers for translations? They are faster then humans Can computer do the similar job as humans? In linguistics probably not
  • 8. Lexical Database Machine Readable Dictionary “A lexical database is a lexical resource which has an associated software environment database which permits access to its contents” What is Lexical Resource? “A lexical resource (LR) is a database consisting of one or several dictionaries.”
  • 9. What a Lexical Database Contains? Information typically stored in a lexical database includes lexical category of words synonyms of words, semantic and phonological relations between different words or sets of words.
  • 10. Why Lexical Databases? Natural language generation systems that produce coherent discourses by verbalizing a set of triples Question Answering systems that interpret user questions with respect to one or more ontologies Text interpretation systems that extract triples with respect to one or more ontologies Query interpretation and semantic search in information retrieval systems Natural language based interfaces to ontologies, Semantic Web and Linked Data.
  • 12. What is WordNet? A large lexical database, or “electronic dictionary,” developed and maintained at Princeton http://wordnet.princeton.edu Includes most English nouns, verbs, adjectives, adverbs Can be used by humans and machines Princeton WordNet is for English only, but it is linked to wordnets is many other languages
  • 13. Authors of the (first) WordNet WordNet was created in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller starting in 1985 and has been directed in recent years by Christiane Fellbaum That is why it is usually called „the Princeton WordNet“ (PWN) George Miller and Christiane Fellbaum were awarded the 2006 Antonio Zampolli Prize for their work with WordNet.
  • 14. WordNet as described by authors WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
  • 15. What’s special about WordNet? Traditional paper dictionaries are organized alphabetically: words that are found together (on the same page) are not related by meaning WordNet is organized by meaning: words in close proximity are semantically similar Human users and computers can browse WordNet and find words that are meaningfully related to their queries (somewhat like in a hyperdimensional thesaurus)
  • 16. What’s special about WordNet? WordNet gives information about two fundamental, universal properties of human language: polysemy and synonymy Polysemy = one:many mapping of form and meaning Synonymy = one:many mapping of meaning and form
  • 17. Polysemy One word form expresses multiple meanings {table, tabular_array} {table, piece_of_furniture} {table, mesa} {table, postpone} Note: the most frequent word forms are the most polysemous!
  • 18. Synonymy One concept is expressed by several different word forms: {beat, hit, strike} {car, motorcar, auto, automobile}
  • 19. Polysemy and synonymy Understanding and generating language (as for translation) means matching a word form with the intended, context-appropriate meaning People (fluent speakers of a language) do this very efficiently
  • 20. Synonymy in WordNet WordNet groups (roughly) synonymous, denotationally equivalent, words into unordered sets of synonyms (“synsets”) {hit, beat, strike} {big, large} {queue, line} By definition, each synset expresses a distinct meaning/concept Each word form-meaning pair is unique
  • 21. Polysemy in WordNet A word form that appears in n synsets is n-fold polysemous {table, tabular_array} {table, piece_of_furniture} {table, mesa} {table, postpone} table is fourfold polysemous/has four senses four distinct concepts are associated with the word form table
  • 22. Hypernymy relates noun synsets Relates more/less general concepts Creates hierarchies, or “trees” {vehicle} / {car, automobile} {bicycle, bike} / {convertible} {SUV} {mountain bike} “A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes” Hierarchies can have up to 16 levels
  • 23. Hyponymy (Association Rules) Transitivity A car is a kind of vehicle An SUV is a kind of car => An SUV is a kind of vehicle
  • 24. Meronymy/holonymy (part-whole relation) {car, automobile} | {engine} / {spark plug} {cylinder} “An engine has spark plugs” “Spark plus and cylinders are parts of an engine”
  • 25. Meronymy/Holonymy (Inheritance) A finger is part of a hand A hand is part of an arm An arm is part of a body =>a finger is part of a body
  • 26. Upward hierarchy in WorldNet {entity} {physical_entity} {object, physical_object} {whole, unit} {living_thing, animate_thing} {organism, being} {animal, animate_being, beast, brute, creature, fauna} {chordate} {vertebrate, craniate} {mammal, mammalian} {placental, placental_mammal, eutherian, eutherian_mammal} {carnivore} {canine, canid} {dog, domestic_dog, Canis_familiaris}
  • 27. 25 unique beginners for noun synsets {act, action, activity} {food} {possession} {animal, fauna} {location, place} {process} {artifact} {motive} {quantity, amount} {attribute, property} {group, collection} {relation} {body, corpus} {natural object} {shape} {cognition, knowledge} {natural phenomenon} {state, condition} {communication} {person, human being} {substance} {event, happening} {plant, flora} {time} {feeling, emotion}
  • 28. Verb clusters Verbs of Bodily Functions and Care (sweat) Motion Verbs (move) Verbs of Change (change) Emotion or Psych Verbs (feel) Verbs of Communication (tell) Stative Verbs (have, wear) Competition Verbs (race) Perception Verbs (see) Consumption Verbs (drink) Verbs of Possession (possess, own) Contact Verbs (touch) Verbs of Social Interaction (request, impeach) Cognition Verbs (think) Weather Verbs (thunder) Creation Verbs (create)
  • 30. What is Computational Lexical Semantics Any computational process involving word meaning! Computing Word Similarity Distributional (Vector) Models of Meaning Computing Word Relations Word Sense Disambiguation Semantic Role Labeling Computing word connotation and sentiment
  • 31. Concrete Applications corpus linguistics machine translation text retrieval text summarization word processing help (discussed above) expert systems speech recognition/synthesis (touched upon above) toys, games automatic telephone interpretation system ultimately … artificial intelligence, robotics
  • 32. Corpus Linguistics This is a generic name for various computer applications that make use of large language databases (called corpora) Having access to a large database enabled us to process linguistic data in a statistical way, rather than in an analytical way. This conflict of two opposing views (statistical vs. analytical) is very apparent in machine translation.
  • 33. Machine Translation (1) text-to-text translation (great need for translation at UN, EC (European Community) Works best when two languages in question are similar in structure Usually, pre-editing and/or post-editing by a human translator is required — machine- assisted translation.
  • 34. Machine Translation (2) Traditionally, MT required parsing, possibly some semantic analysis, then mapping to a syntactic tree of the sentence in the target language. An alternative is appeal to statistical means of mapping a surface string in the source language to a surface string in the target language. Difficulty with word-for-word translation
  • 35. Computational Semantics The study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. This could play an important role in such application areas as machine translation when two typologically distinct languages are involved (e.g. English and Japanese).
  • 36. Text Retrieval key word  text/book key word: morphology 1. Principles of Polymer Morphology 2. Image Analysis and Mathematical Morphology 3. Drainage Basin Morphology 4. French Morphology We need morphological, syntactic, and semantic information to find the right text/book. Further applications: search engines, etc.
  • 37. Text Summarization We need to be able to select the right information from the electronic documents available (esp. on the web). Automatic text summarization is a technique that can help people to quickly grasp the concepts presented in a document by creating an abstract or summary of the original text.
  • 38. Semantic Web Some people (e.g. Evergreen U) are trying to classify contents of web pages so that they are meaningful to computers. But this is not an easy task since the categories must presumably be pre-selected by people. The semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. http://www.w3.org/2001/sw/
  • 39. Ontology:OriginsandHistory OntologyinPhilosophy  A philosophical discipline - a branch of philosophy that deals with the nature and the organisation of reality  Science of Being (Aristotle, Metaphysics, IV, 1)  Tries to answer the questions:  What characterizes being?  Eventually, what is being?
  • 40. Ontology in Computer Science An ontology is an engineering artifact: It is constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary. Thus, an ontology describes a formal specification of a certain domain: Shared understanding of a domain of interest Formal and machine manipulable model of a domain of interest
  • 41. How to use Lexical Ontologies 1. Ontology-based Information Extraction and Ontology Population from Text 2. Ontology-based Question Answering 3. Natural Language Generation from Triples 4. Integration and publishing of legacy language resources 5. Representation of Translations in the Web of Data 6. Ontology-based Machine Translation
  • 42. Lexicon Model for Ontologies
  • 43. Conclusion Database Development is basic building block for Machine Translation, Natural Language Processing and Computational Linguistics WorldNet is one of the richest resource and its structure can be used to create new lexical database for our language (Urdu/Persian/Arabic) Ontologies can be used to add enhanced semantics to the lexical resources beyond the limits of databases because of their nature and capability to describer things