SlideShare a Scribd company logo
1 of 44
Introduction to
Development of Lexical
Databases
Muhammad Shoaib
PhD Researcher (Biomedical Engineering)
Asan Medical Complex
College of Medicine University of Ulsan
Researcher Gachon University Gil Medical Center
Republic Of Korea
About me: Son of Soil
BS Computer Science (2006-2010)
FAST National University of Computer and Emerging
Sciences
ME Computer Engineering (2011-2013)
Jeju National University Republic of Korea
PhD Biomedical Engineering (2015 – To date)
Asan Medical Center, University of Ulsan, Republic of
Korea
Lecturer at Institute of Space Technology 2013-2015
Overview
Lexical Databases and DBMS
WordNet (we’ll see who can we adopt it)
Computational Linguistics
Lexical Ontologies
Database Management System
Data:
Facts and statistics collected together for
reference or analysis.
Database:
A structured set of data held in a computer,
especially one that is accessible in various ways.
Database Management System
computer-software application that interacts
with end-users, other applications
What we are talking about today?
Globalization requires more texts and speech
to be translated faster across more languages
Manual translation is difficult, expensive, time-
consuming
Machine translation is of low quality, often
unacceptable
Why Lexical Database
What are reading how computer’s can
understand?
Why we need computers for translations?
They are faster then humans
Can computer do the similar job as humans?
In linguistics probably not
Lexical Database
Machine Readable Dictionary
“A lexical database is a lexical resource which has an
associated software environment database which permits
access to its contents”
What is Lexical Resource?
“A lexical resource (LR) is a database consisting of one or
several dictionaries.”
What a Lexical Database Contains?
Information typically stored in a lexical
database includes
lexical category of words
synonyms of words,
semantic and phonological relations between
different words or sets of words.
Why Lexical Databases?
Natural language generation systems that produce
coherent discourses by verbalizing a set of triples
Question Answering systems that interpret user
questions with respect to one or more ontologies
Text interpretation systems that extract triples with
respect to one or more ontologies
Query interpretation and semantic search in
information retrieval systems
Natural language based interfaces to ontologies,
Semantic Web and Linked Data.
WordNet
What is WordNet?
A large lexical database, or “electronic
dictionary,” developed and maintained at
Princeton
http://wordnet.princeton.edu
Includes most English nouns, verbs, adjectives,
adverbs
Can be used by humans and machines
Princeton WordNet is for English only, but it is
linked to wordnets is many other languages
Authors of the (first) WordNet
WordNet was created in the Cognitive
Science Laboratory of Princeton University under the
direction of psychology professor George Armitage
Miller starting in 1985 and has been directed in
recent years by Christiane Fellbaum
That is why it is usually called „the Princeton WordNet“
(PWN)
George Miller and Christiane Fellbaum were awarded
the 2006 Antonio Zampolli Prize for their work with
WordNet.
WordNet as described by authors
WordNet is an on-line lexical reference system
whose design is inspired by current
psycholinguistic theories of human lexical
memory. English nouns, verbs, and adjectives
are organized into synonym sets, each
representing one underlying lexical concept.
Different relations link the synonym sets.
What’s special about WordNet?
Traditional paper dictionaries are organized
alphabetically: words that are found together (on the
same page) are not related by meaning
WordNet is organized by meaning: words in close
proximity are semantically similar
Human users and computers can browse WordNet
and find words that are meaningfully related to their
queries (somewhat like in a hyperdimensional
thesaurus)
What’s special about WordNet?
WordNet gives information about two fundamental,
universal properties of human language:
polysemy and synonymy
Polysemy = one:many mapping of form and
meaning
Synonymy = one:many mapping of meaning and
form
Polysemy
One word form expresses multiple meanings
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
Note: the most frequent word forms are the most
polysemous!
Synonymy
One concept is expressed by several different
word forms:
{beat, hit, strike}
{car, motorcar, auto, automobile}
Polysemy and synonymy
Understanding and generating language (as for
translation) means matching a word form with
the intended, context-appropriate meaning
People (fluent speakers of a language) do this
very efficiently
Synonymy in WordNet
WordNet groups (roughly) synonymous,
denotationally equivalent, words into unordered
sets of synonyms (“synsets”)
{hit, beat, strike}
{big, large}
{queue, line}
By definition, each synset expresses a distinct
meaning/concept
Each word form-meaning pair is unique
Polysemy in WordNet
A word form that appears in n synsets
is n-fold polysemous
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
table is fourfold polysemous/has four senses
four distinct concepts are associated with the word form table
Hypernymy relates noun synsets
Relates more/less general concepts
Creates hierarchies, or “trees”
{vehicle}
/ 
{car, automobile} {bicycle, bike}
/  
{convertible} {SUV} {mountain bike}
“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”
Hierarchies can have up to 16 levels
Hyponymy (Association Rules)
Transitivity
A car is a kind of vehicle
An SUV is a kind of car
=> An SUV is a kind of vehicle
Meronymy/holonymy
(part-whole relation)
{car, automobile}
|
{engine}
/ 
{spark plug} {cylinder}
“An engine has spark plugs”
“Spark plus and cylinders are parts of
an engine”
Meronymy/Holonymy (Inheritance)
A finger is part of a hand
A hand is part of an arm
An arm is part of a body
=>a finger is part of a body
Upward hierarchy in WorldNet
{entity}
{physical_entity}
{object, physical_object}
{whole, unit}
{living_thing, animate_thing}
{organism, being}
{animal, animate_being, beast, brute, creature, fauna}
{chordate}
{vertebrate, craniate}
{mammal, mammalian}
{placental, placental_mammal, eutherian, eutherian_mammal}
{carnivore}
{canine, canid}
{dog, domestic_dog, Canis_familiaris}
25 unique beginners for noun
synsets
{act, action, activity} {food} {possession}
{animal, fauna} {location, place} {process}
{artifact} {motive} {quantity, amount}
{attribute, property} {group, collection} {relation}
{body, corpus} {natural object} {shape}
{cognition, knowledge} {natural phenomenon} {state, condition}
{communication} {person, human being} {substance}
{event, happening} {plant, flora} {time}
{feeling, emotion}
Verb clusters
Verbs of Bodily Functions and
Care (sweat)
Motion Verbs (move)
Verbs of Change (change) Emotion or Psych Verbs (feel)
Verbs of Communication (tell) Stative Verbs (have, wear)
Competition Verbs (race) Perception Verbs (see)
Consumption Verbs (drink) Verbs of Possession (possess,
own)
Contact Verbs (touch) Verbs of Social Interaction
(request, impeach)
Cognition Verbs (think) Weather Verbs (thunder)
Creation Verbs (create)
Computational
Linguistics
What is Computational Lexical Semantics
Any computational process involving word
meaning!
Computing Word Similarity
Distributional (Vector) Models of Meaning
Computing Word Relations
Word Sense Disambiguation
Semantic Role Labeling
Computing word connotation and sentiment
Concrete Applications
corpus linguistics
machine translation
text retrieval
text summarization
word processing help (discussed above)
expert systems
speech recognition/synthesis (touched upon above)
toys, games
automatic telephone interpretation system
ultimately … artificial intelligence, robotics
Corpus Linguistics
This is a generic name for various computer
applications that make use of large language
databases (called corpora)
Having access to a large database enabled us
to process linguistic data in a statistical way,
rather than in an analytical way.
This conflict of two opposing views
(statistical vs. analytical) is very apparent in
machine translation.
Machine Translation (1)
text-to-text translation (great need for
translation at UN, EC (European
Community)
Works best when two languages in
question are similar in structure
Usually, pre-editing and/or post-editing by
a human translator is required — machine-
assisted translation.
Machine Translation (2)
Traditionally, MT required parsing, possibly
some semantic analysis, then mapping to a
syntactic tree of the sentence in the target
language.
An alternative is appeal to statistical means
of mapping a surface string in the source
language to a surface string in the target
language.
Difficulty with word-for-word translation
Computational Semantics
The study of how to automate the process of
constructing and reasoning with meaning
representations of natural language expressions.
This could play an important role in such application
areas as machine translation when two
typologically distinct languages are involved (e.g.
English and Japanese).
Text Retrieval
key word  text/book
key word: morphology
1. Principles of Polymer Morphology
2. Image Analysis and Mathematical Morphology
3. Drainage Basin Morphology
4. French Morphology
We need morphological, syntactic, and semantic
information to find the right text/book.
Further applications: search engines, etc.
Text Summarization
We need to be able to select the right
information from the electronic documents
available (esp. on the web).
Automatic text summarization is a
technique that can help people to quickly
grasp the concepts presented in a
document by creating an abstract or
summary of the original text.
Semantic Web
Some people (e.g. Evergreen U) are trying
to classify contents of web pages so that
they are meaningful to computers. But this
is not an easy task since the categories
must presumably be pre-selected by
people.
The semantic Web provides a common
framework that allows data to be shared
and reused across application, enterprise,
and community boundaries.
http://www.w3.org/2001/sw/
Ontology:OriginsandHistory
OntologyinPhilosophy
 A philosophical discipline - a branch of
philosophy that deals with the nature and the
organisation of reality
 Science of Being (Aristotle, Metaphysics, IV, 1)
 Tries to answer the questions:
 What characterizes being?
 Eventually, what is being?
Ontology in Computer Science
An ontology is an engineering artifact:
It is constituted by a specific vocabulary
used to describe a certain reality, plus
a set of explicit assumptions regarding
the intended meaning of the vocabulary.
Thus, an ontology describes a formal specification of a
certain domain:
Shared understanding of a domain of
interest
Formal and machine manipulable model
of a domain of interest
How to use Lexical Ontologies
1. Ontology-based Information Extraction and
Ontology Population from Text
2. Ontology-based Question Answering
3. Natural Language Generation from Triples
4. Integration and publishing of legacy language
resources
5. Representation of Translations in the Web of
Data
6. Ontology-based Machine Translation
Lexicon Model for Ontologies
Conclusion
Database Development is basic building block for Machine
Translation, Natural Language Processing and
Computational Linguistics
WorldNet is one of the richest resource and its structure can
be used to create new lexical database for our language
(Urdu/Persian/Arabic)
Ontologies can be used to add enhanced semantics to the
lexical resources beyond the limits of databases because of
their nature and capability to describer things
Thanks
and
Questions

More Related Content

What's hot

What's hot (20)

Pragmatics
Pragmatics Pragmatics
Pragmatics
 
A word and its forms inflection
A word and its forms inflectionA word and its forms inflection
A word and its forms inflection
 
SEMANTIC = LEXICAL RELATIONS
SEMANTIC = LEXICAL RELATIONS SEMANTIC = LEXICAL RELATIONS
SEMANTIC = LEXICAL RELATIONS
 
Deixis and Distance
Deixis and DistanceDeixis and Distance
Deixis and Distance
 
Reference and sense
Reference and senseReference and sense
Reference and sense
 
Areal linguistics
Areal linguisticsAreal linguistics
Areal linguistics
 
Neologisms
NeologismsNeologisms
Neologisms
 
Morphology web
Morphology webMorphology web
Morphology web
 
Speech Community
Speech CommunitySpeech Community
Speech Community
 
Morphology, grammar
Morphology, grammarMorphology, grammar
Morphology, grammar
 
Semantics
Semantics Semantics
Semantics
 
Collocation
CollocationCollocation
Collocation
 
Deixis
DeixisDeixis
Deixis
 
Introduction to Soicolinguistics
Introduction to SoicolinguisticsIntroduction to Soicolinguistics
Introduction to Soicolinguistics
 
Origin of grammar
Origin of grammarOrigin of grammar
Origin of grammar
 
Morphology2
Morphology2Morphology2
Morphology2
 
American structuralism
American structuralismAmerican structuralism
American structuralism
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
 
3 the referential-theory2
3 the referential-theory23 the referential-theory2
3 the referential-theory2
 
CDA and Gender
CDA and GenderCDA and Gender
CDA and Gender
 

Similar to Introduction to development of lexical databases

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESLinda Garcia
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Language Grid
Language GridLanguage Grid
Language Gridlindh
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceDaniel Lewis
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 

Similar to Introduction to development of lexical databases (20)

The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
NLP
NLPNLP
NLP
 
Language Grid
Language GridLanguage Grid
Language Grid
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
 
Textmining
TextminingTextmining
Textmining
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Introduction to development of lexical databases

  • 1. Introduction to Development of Lexical Databases Muhammad Shoaib PhD Researcher (Biomedical Engineering) Asan Medical Complex College of Medicine University of Ulsan Researcher Gachon University Gil Medical Center Republic Of Korea
  • 2. About me: Son of Soil BS Computer Science (2006-2010) FAST National University of Computer and Emerging Sciences ME Computer Engineering (2011-2013) Jeju National University Republic of Korea PhD Biomedical Engineering (2015 – To date) Asan Medical Center, University of Ulsan, Republic of Korea Lecturer at Institute of Space Technology 2013-2015
  • 3. Overview Lexical Databases and DBMS WordNet (we’ll see who can we adopt it) Computational Linguistics Lexical Ontologies
  • 4.
  • 5. Database Management System Data: Facts and statistics collected together for reference or analysis. Database: A structured set of data held in a computer, especially one that is accessible in various ways. Database Management System computer-software application that interacts with end-users, other applications
  • 6. What we are talking about today? Globalization requires more texts and speech to be translated faster across more languages Manual translation is difficult, expensive, time- consuming Machine translation is of low quality, often unacceptable
  • 7. Why Lexical Database What are reading how computer’s can understand? Why we need computers for translations? They are faster then humans Can computer do the similar job as humans? In linguistics probably not
  • 8. Lexical Database Machine Readable Dictionary “A lexical database is a lexical resource which has an associated software environment database which permits access to its contents” What is Lexical Resource? “A lexical resource (LR) is a database consisting of one or several dictionaries.”
  • 9. What a Lexical Database Contains? Information typically stored in a lexical database includes lexical category of words synonyms of words, semantic and phonological relations between different words or sets of words.
  • 10. Why Lexical Databases? Natural language generation systems that produce coherent discourses by verbalizing a set of triples Question Answering systems that interpret user questions with respect to one or more ontologies Text interpretation systems that extract triples with respect to one or more ontologies Query interpretation and semantic search in information retrieval systems Natural language based interfaces to ontologies, Semantic Web and Linked Data.
  • 12. What is WordNet? A large lexical database, or “electronic dictionary,” developed and maintained at Princeton http://wordnet.princeton.edu Includes most English nouns, verbs, adjectives, adverbs Can be used by humans and machines Princeton WordNet is for English only, but it is linked to wordnets is many other languages
  • 13. Authors of the (first) WordNet WordNet was created in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller starting in 1985 and has been directed in recent years by Christiane Fellbaum That is why it is usually called „the Princeton WordNet“ (PWN) George Miller and Christiane Fellbaum were awarded the 2006 Antonio Zampolli Prize for their work with WordNet.
  • 14. WordNet as described by authors WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
  • 15. What’s special about WordNet? Traditional paper dictionaries are organized alphabetically: words that are found together (on the same page) are not related by meaning WordNet is organized by meaning: words in close proximity are semantically similar Human users and computers can browse WordNet and find words that are meaningfully related to their queries (somewhat like in a hyperdimensional thesaurus)
  • 16. What’s special about WordNet? WordNet gives information about two fundamental, universal properties of human language: polysemy and synonymy Polysemy = one:many mapping of form and meaning Synonymy = one:many mapping of meaning and form
  • 17. Polysemy One word form expresses multiple meanings {table, tabular_array} {table, piece_of_furniture} {table, mesa} {table, postpone} Note: the most frequent word forms are the most polysemous!
  • 18. Synonymy One concept is expressed by several different word forms: {beat, hit, strike} {car, motorcar, auto, automobile}
  • 19. Polysemy and synonymy Understanding and generating language (as for translation) means matching a word form with the intended, context-appropriate meaning People (fluent speakers of a language) do this very efficiently
  • 20. Synonymy in WordNet WordNet groups (roughly) synonymous, denotationally equivalent, words into unordered sets of synonyms (“synsets”) {hit, beat, strike} {big, large} {queue, line} By definition, each synset expresses a distinct meaning/concept Each word form-meaning pair is unique
  • 21. Polysemy in WordNet A word form that appears in n synsets is n-fold polysemous {table, tabular_array} {table, piece_of_furniture} {table, mesa} {table, postpone} table is fourfold polysemous/has four senses four distinct concepts are associated with the word form table
  • 22. Hypernymy relates noun synsets Relates more/less general concepts Creates hierarchies, or “trees” {vehicle} / {car, automobile} {bicycle, bike} / {convertible} {SUV} {mountain bike} “A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes” Hierarchies can have up to 16 levels
  • 23. Hyponymy (Association Rules) Transitivity A car is a kind of vehicle An SUV is a kind of car => An SUV is a kind of vehicle
  • 24. Meronymy/holonymy (part-whole relation) {car, automobile} | {engine} / {spark plug} {cylinder} “An engine has spark plugs” “Spark plus and cylinders are parts of an engine”
  • 25. Meronymy/Holonymy (Inheritance) A finger is part of a hand A hand is part of an arm An arm is part of a body =>a finger is part of a body
  • 26. Upward hierarchy in WorldNet {entity} {physical_entity} {object, physical_object} {whole, unit} {living_thing, animate_thing} {organism, being} {animal, animate_being, beast, brute, creature, fauna} {chordate} {vertebrate, craniate} {mammal, mammalian} {placental, placental_mammal, eutherian, eutherian_mammal} {carnivore} {canine, canid} {dog, domestic_dog, Canis_familiaris}
  • 27. 25 unique beginners for noun synsets {act, action, activity} {food} {possession} {animal, fauna} {location, place} {process} {artifact} {motive} {quantity, amount} {attribute, property} {group, collection} {relation} {body, corpus} {natural object} {shape} {cognition, knowledge} {natural phenomenon} {state, condition} {communication} {person, human being} {substance} {event, happening} {plant, flora} {time} {feeling, emotion}
  • 28. Verb clusters Verbs of Bodily Functions and Care (sweat) Motion Verbs (move) Verbs of Change (change) Emotion or Psych Verbs (feel) Verbs of Communication (tell) Stative Verbs (have, wear) Competition Verbs (race) Perception Verbs (see) Consumption Verbs (drink) Verbs of Possession (possess, own) Contact Verbs (touch) Verbs of Social Interaction (request, impeach) Cognition Verbs (think) Weather Verbs (thunder) Creation Verbs (create)
  • 30. What is Computational Lexical Semantics Any computational process involving word meaning! Computing Word Similarity Distributional (Vector) Models of Meaning Computing Word Relations Word Sense Disambiguation Semantic Role Labeling Computing word connotation and sentiment
  • 31. Concrete Applications corpus linguistics machine translation text retrieval text summarization word processing help (discussed above) expert systems speech recognition/synthesis (touched upon above) toys, games automatic telephone interpretation system ultimately … artificial intelligence, robotics
  • 32. Corpus Linguistics This is a generic name for various computer applications that make use of large language databases (called corpora) Having access to a large database enabled us to process linguistic data in a statistical way, rather than in an analytical way. This conflict of two opposing views (statistical vs. analytical) is very apparent in machine translation.
  • 33. Machine Translation (1) text-to-text translation (great need for translation at UN, EC (European Community) Works best when two languages in question are similar in structure Usually, pre-editing and/or post-editing by a human translator is required — machine- assisted translation.
  • 34. Machine Translation (2) Traditionally, MT required parsing, possibly some semantic analysis, then mapping to a syntactic tree of the sentence in the target language. An alternative is appeal to statistical means of mapping a surface string in the source language to a surface string in the target language. Difficulty with word-for-word translation
  • 35. Computational Semantics The study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. This could play an important role in such application areas as machine translation when two typologically distinct languages are involved (e.g. English and Japanese).
  • 36. Text Retrieval key word  text/book key word: morphology 1. Principles of Polymer Morphology 2. Image Analysis and Mathematical Morphology 3. Drainage Basin Morphology 4. French Morphology We need morphological, syntactic, and semantic information to find the right text/book. Further applications: search engines, etc.
  • 37. Text Summarization We need to be able to select the right information from the electronic documents available (esp. on the web). Automatic text summarization is a technique that can help people to quickly grasp the concepts presented in a document by creating an abstract or summary of the original text.
  • 38. Semantic Web Some people (e.g. Evergreen U) are trying to classify contents of web pages so that they are meaningful to computers. But this is not an easy task since the categories must presumably be pre-selected by people. The semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. http://www.w3.org/2001/sw/
  • 39. Ontology:OriginsandHistory OntologyinPhilosophy  A philosophical discipline - a branch of philosophy that deals with the nature and the organisation of reality  Science of Being (Aristotle, Metaphysics, IV, 1)  Tries to answer the questions:  What characterizes being?  Eventually, what is being?
  • 40. Ontology in Computer Science An ontology is an engineering artifact: It is constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary. Thus, an ontology describes a formal specification of a certain domain: Shared understanding of a domain of interest Formal and machine manipulable model of a domain of interest
  • 41. How to use Lexical Ontologies 1. Ontology-based Information Extraction and Ontology Population from Text 2. Ontology-based Question Answering 3. Natural Language Generation from Triples 4. Integration and publishing of legacy language resources 5. Representation of Translations in the Web of Data 6. Ontology-based Machine Translation
  • 42. Lexicon Model for Ontologies
  • 43. Conclusion Database Development is basic building block for Machine Translation, Natural Language Processing and Computational Linguistics WorldNet is one of the richest resource and its structure can be used to create new lexical database for our language (Urdu/Persian/Arabic) Ontologies can be used to add enhanced semantics to the lexical resources beyond the limits of databases because of their nature and capability to describer things