کچھ عرصہ قبل جامعہ گجرات کے شعبہ علوم ترجمعہ میں ایک پرزنٹیشن دینے کا موقع ملا۔ محترم محمد کامران لیکچرار ڈیپارٹمنٹ ہذا کی خواہش پر سلائڈز شئر کر دی گئی ہیں سلائڈز مندرجہ ذیل لنک سے حاصل کی جا سکتی ہیں۔
پرزنٹیشن کی ویڈیو انشاءاللہ جلد اپلوڈ کر دی جائے گی۔
1. Introduction to
Development of Lexical
Databases
Muhammad Shoaib
PhD Researcher (Biomedical Engineering)
Asan Medical Complex
College of Medicine University of Ulsan
Researcher Gachon University Gil Medical Center
Republic Of Korea
2. About me: Son of Soil
BS Computer Science (2006-2010)
FAST National University of Computer and Emerging
Sciences
ME Computer Engineering (2011-2013)
Jeju National University Republic of Korea
PhD Biomedical Engineering (2015 – To date)
Asan Medical Center, University of Ulsan, Republic of
Korea
Lecturer at Institute of Space Technology 2013-2015
5. Database Management System
Data:
Facts and statistics collected together for
reference or analysis.
Database:
A structured set of data held in a computer,
especially one that is accessible in various ways.
Database Management System
computer-software application that interacts
with end-users, other applications
6. What we are talking about today?
Globalization requires more texts and speech
to be translated faster across more languages
Manual translation is difficult, expensive, time-
consuming
Machine translation is of low quality, often
unacceptable
7. Why Lexical Database
What are reading how computer’s can
understand?
Why we need computers for translations?
They are faster then humans
Can computer do the similar job as humans?
In linguistics probably not
8. Lexical Database
Machine Readable Dictionary
“A lexical database is a lexical resource which has an
associated software environment database which permits
access to its contents”
What is Lexical Resource?
“A lexical resource (LR) is a database consisting of one or
several dictionaries.”
9. What a Lexical Database Contains?
Information typically stored in a lexical
database includes
lexical category of words
synonyms of words,
semantic and phonological relations between
different words or sets of words.
10. Why Lexical Databases?
Natural language generation systems that produce
coherent discourses by verbalizing a set of triples
Question Answering systems that interpret user
questions with respect to one or more ontologies
Text interpretation systems that extract triples with
respect to one or more ontologies
Query interpretation and semantic search in
information retrieval systems
Natural language based interfaces to ontologies,
Semantic Web and Linked Data.
12. What is WordNet?
A large lexical database, or “electronic
dictionary,” developed and maintained at
Princeton
http://wordnet.princeton.edu
Includes most English nouns, verbs, adjectives,
adverbs
Can be used by humans and machines
Princeton WordNet is for English only, but it is
linked to wordnets is many other languages
13. Authors of the (first) WordNet
WordNet was created in the Cognitive
Science Laboratory of Princeton University under the
direction of psychology professor George Armitage
Miller starting in 1985 and has been directed in
recent years by Christiane Fellbaum
That is why it is usually called „the Princeton WordNet“
(PWN)
George Miller and Christiane Fellbaum were awarded
the 2006 Antonio Zampolli Prize for their work with
WordNet.
14. WordNet as described by authors
WordNet is an on-line lexical reference system
whose design is inspired by current
psycholinguistic theories of human lexical
memory. English nouns, verbs, and adjectives
are organized into synonym sets, each
representing one underlying lexical concept.
Different relations link the synonym sets.
15. What’s special about WordNet?
Traditional paper dictionaries are organized
alphabetically: words that are found together (on the
same page) are not related by meaning
WordNet is organized by meaning: words in close
proximity are semantically similar
Human users and computers can browse WordNet
and find words that are meaningfully related to their
queries (somewhat like in a hyperdimensional
thesaurus)
16. What’s special about WordNet?
WordNet gives information about two fundamental,
universal properties of human language:
polysemy and synonymy
Polysemy = one:many mapping of form and
meaning
Synonymy = one:many mapping of meaning and
form
17. Polysemy
One word form expresses multiple meanings
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
Note: the most frequent word forms are the most
polysemous!
18. Synonymy
One concept is expressed by several different
word forms:
{beat, hit, strike}
{car, motorcar, auto, automobile}
19. Polysemy and synonymy
Understanding and generating language (as for
translation) means matching a word form with
the intended, context-appropriate meaning
People (fluent speakers of a language) do this
very efficiently
20. Synonymy in WordNet
WordNet groups (roughly) synonymous,
denotationally equivalent, words into unordered
sets of synonyms (“synsets”)
{hit, beat, strike}
{big, large}
{queue, line}
By definition, each synset expresses a distinct
meaning/concept
Each word form-meaning pair is unique
21. Polysemy in WordNet
A word form that appears in n synsets
is n-fold polysemous
{table, tabular_array}
{table, piece_of_furniture}
{table, mesa}
{table, postpone}
table is fourfold polysemous/has four senses
four distinct concepts are associated with the word form table
22. Hypernymy relates noun synsets
Relates more/less general concepts
Creates hierarchies, or “trees”
{vehicle}
/
{car, automobile} {bicycle, bike}
/
{convertible} {SUV} {mountain bike}
“A car is is a kind of vehicle” <=>“The class of vehicles includes cars, bikes”
Hierarchies can have up to 16 levels
30. What is Computational Lexical Semantics
Any computational process involving word
meaning!
Computing Word Similarity
Distributional (Vector) Models of Meaning
Computing Word Relations
Word Sense Disambiguation
Semantic Role Labeling
Computing word connotation and sentiment
31. Concrete Applications
corpus linguistics
machine translation
text retrieval
text summarization
word processing help (discussed above)
expert systems
speech recognition/synthesis (touched upon above)
toys, games
automatic telephone interpretation system
ultimately … artificial intelligence, robotics
32. Corpus Linguistics
This is a generic name for various computer
applications that make use of large language
databases (called corpora)
Having access to a large database enabled us
to process linguistic data in a statistical way,
rather than in an analytical way.
This conflict of two opposing views
(statistical vs. analytical) is very apparent in
machine translation.
33. Machine Translation (1)
text-to-text translation (great need for
translation at UN, EC (European
Community)
Works best when two languages in
question are similar in structure
Usually, pre-editing and/or post-editing by
a human translator is required — machine-
assisted translation.
34. Machine Translation (2)
Traditionally, MT required parsing, possibly
some semantic analysis, then mapping to a
syntactic tree of the sentence in the target
language.
An alternative is appeal to statistical means
of mapping a surface string in the source
language to a surface string in the target
language.
Difficulty with word-for-word translation
35. Computational Semantics
The study of how to automate the process of
constructing and reasoning with meaning
representations of natural language expressions.
This could play an important role in such application
areas as machine translation when two
typologically distinct languages are involved (e.g.
English and Japanese).
36. Text Retrieval
key word text/book
key word: morphology
1. Principles of Polymer Morphology
2. Image Analysis and Mathematical Morphology
3. Drainage Basin Morphology
4. French Morphology
We need morphological, syntactic, and semantic
information to find the right text/book.
Further applications: search engines, etc.
37. Text Summarization
We need to be able to select the right
information from the electronic documents
available (esp. on the web).
Automatic text summarization is a
technique that can help people to quickly
grasp the concepts presented in a
document by creating an abstract or
summary of the original text.
38. Semantic Web
Some people (e.g. Evergreen U) are trying
to classify contents of web pages so that
they are meaningful to computers. But this
is not an easy task since the categories
must presumably be pre-selected by
people.
The semantic Web provides a common
framework that allows data to be shared
and reused across application, enterprise,
and community boundaries.
http://www.w3.org/2001/sw/
39. Ontology:OriginsandHistory
OntologyinPhilosophy
A philosophical discipline - a branch of
philosophy that deals with the nature and the
organisation of reality
Science of Being (Aristotle, Metaphysics, IV, 1)
Tries to answer the questions:
What characterizes being?
Eventually, what is being?
40. Ontology in Computer Science
An ontology is an engineering artifact:
It is constituted by a specific vocabulary
used to describe a certain reality, plus
a set of explicit assumptions regarding
the intended meaning of the vocabulary.
Thus, an ontology describes a formal specification of a
certain domain:
Shared understanding of a domain of
interest
Formal and machine manipulable model
of a domain of interest
41. How to use Lexical Ontologies
1. Ontology-based Information Extraction and
Ontology Population from Text
2. Ontology-based Question Answering
3. Natural Language Generation from Triples
4. Integration and publishing of legacy language
resources
5. Representation of Translations in the Web of
Data
6. Ontology-based Machine Translation
43. Conclusion
Database Development is basic building block for Machine
Translation, Natural Language Processing and
Computational Linguistics
WorldNet is one of the richest resource and its structure can
be used to create new lexical database for our language
(Urdu/Persian/Arabic)
Ontologies can be used to add enhanced semantics to the
lexical resources beyond the limits of databases because of
their nature and capability to describer things