SlideShare a Scribd company logo
Information Retrieval Based
On Word Sens
Athman Hajhamou
Computer and Modeling Laboratory –
USMBA- FSDM – Fès




                                     1
Summary
 Research domain
 Characteristics of classical arabic
 Morphological processing
 Research problem
 Semantic approches




                                        2
Research domain
   Natural Language Processing (NLP) :
    is a theoretically motivated range of computational
    techniques for analyzing and representing naturally
    occurring texts at one or more levels of linguistic
    analysis for the purpose of achieving human-like
    language processing for a range of tasks or
    applications.




                                                      3
Research domain
   Levels of Natural Language Processing :
    Phonology.
    Morphology.
    Lexical.
    Syntactic.
    Semantic.




                                              4
Research domain
   Levels of Natural Language Processing :
     Phonology :
     this level deals with the interpretation of speech
     sounds within and across words. In a NLP
     system that accept spoken input, the sound
     waves are analyzed and encoded into digitized
     signal for interpretation.




                                                          5
Research domain
   Levels of Natural Language Processing :
     Morphology :
     this level deals with the componential nature of
     words, which are composed of morphemes – the
     smallest units of meaning. For example the word
              can be morphologically analyzed into
     three separate morphemes: the prefix , the root
        , and the suffix . NLP system can recognize
     the meaning conveyed by each morpheme in
     order to gain and represent meaning.



                                                    6
Research domain
   Levels of Natural Language Processing :
     Lexical :
     At this level, the words that have only one
     possible sense or meaning can be replaced by a
     semantic representation of that meanings. The
     nature of the representation varies according to
     the semantic theory utilized in the NLP system.
     The lexical level may require a lexicon an the
     particular approach taken by NLP system will
     determine whether a lexicon will be utilized, as
     well as the nature and extent of the information
     that is encoded in the lexicon.

                                                    7
Research domain
   Levels of Natural Language Processing :
     Syntactic :
     This level focuses on analyzing the words in a
     sentence and so as to uncover the grammatical
     structure of the sentence. The output of this level
     of processing is a representation of the sentence
     that    reveals    the  structural    dependency
     relationships between the words.           Syntax
     conveys meaning in most languages because
     order and dependency contribute to meaning.




                                                       8
Research domain
   Levels of Natural Language Processing :
     Semantic :
     This is the level at witch most people think mining is
     determined, however, as we can see in the above
     defining of the levels, it is all the levels that contribute
     to meaning. Semantic processing determines the
     possible meanings of a sentence by focusing on the
     interactions among word-level meanings in the
     sentences. This level of processing can include the
     semantic disambiguation of words with multiple
     senses. Semantic disambiguation permits one and
     only one sense of polysemous words to be selected.




                                                                9
Research domain
   Information Retrieval (IR):
    Can be defined as a study of how to
    determine and retrieve from a corpus
    of stored information the portion witch
    are relevant to particular information
    need. The information may be stored
    in a structured form or in a
    unstructured form, depending upon its
    applications


                                          10
Research domain
   Information Retrieval (IR):
    A user of the store has to express his information
    need as a request for information in one form or
    another. Thus IR is concerned with the
    determining and retrieving of information that is
    relevant to his information need as expressed by
    his request and translated into a query witch
    conforms to a specific information retrieval
    system (IRS). An IRS normally stores surrogates
    of the actually documents in the system to
    represent the documents and the information
    stored in them.



                                                     11
Characteristics of classical
arabic
   The Arabic Language raise several
    challenges to Natural Language
    Processing (NLP) largely due to its
    rich morphology. Morphological
    processing becomes particulary
    important for Information retrieval (IR),
    because IR needs to determine an
    appropriate form of words as index.


                                            12
Characteristics of classical
arabic
   The Arabic Language is a semantic
    language with a composite morphology.
    Arabic words are categorized as
    particles, nouns, or verbs. Unlike most
    western languages, Arabic script writing
    orientation is from right to left. There are
    28 characters in Arabic. The characters
    are connected and do not start with
    capital letter. Most of the characters
    differ in shape based in their position in
    the sentence and adjunct letters.

                                               13
Morphological processing

   Almost    all   information    retrieval
    systems work in the same way and
    pass several steps before retrieve the
    most relevant documents in the field of
    some formulated queries. These steps
    deal with a set of documents and its
    text contents deal with representations
    of documents.


                                           14
Morphological processing

   Pre-processing :
    document content is pre-processed
    before search process. Pre-processing
    can be divided into four text operations :
     Lexical analysis of the text with the objective
     of treating digits, hyphens, punctuation
     marks.
    Elimination of the stop words.
    Remove diacritics.
    Normalization of the word.
    Stemming.
    Selection of index term.

                                                    15
Morphological processing

Pre-processing :
 Lexical analysis of the text :
 the text of every text file is converted
 into a stream of words (the candidate
 words to be adopted as index). The
 following three case have to be
 considered with care : not Arabic
 word, punctuation marks, digits.


                                        16
Morphological processing

   Pre-processing :
   Elimination of the stop words :
    Stop words are words which are too
    frequent among text files which do not
    carry a particular and useful meaning
    for IR. Elimination of stop words
    reduces the size of the indexing
    structure.


                                         17
Morphological processing

   Pre-processing :
   Remove diacritics :
    short vowels and other diacritics are
    removed from every text file. Short
    vowels include the fatha, domma, and
    kasra. Others diacritics such as the
    shadda, sikkun, and tanween.



                                        18
Morphological processing

   Pre-processing :
   Normalization of the words:
    is the process of unification of different
    form of the same letter.




                                             19
Morphological processing

   Pre-processing :
   Stemming :
    stemming of the remaining words with
    objective of remaining affixes (prefixes
    and suffixes) and allowing the retrieval
    of documents containing syntactic
    variations     of     query       terms.
    (Mountassire)


                                           20
Morphological processing

   Pre-processing :
   Selection of index term :
    Index term or Keyword a pre-selected
    term which can be used to refer to the
    content of a document.




                                         21
Morphological processing

   Search method:
    is based on the root of the word, each
    word of the user query is go back to the
    previous    phase     (text  files   pre-
    processing) and do all pre-processing
    steps. Each root words of the user query
    is matched to the root word in the index
    table and retrieve documents or portions
    of documents that have the same root
    word.

                                            22
Research problem

   Synonymy and polysemy are two
    important areas in linguistics that
    present a problem for computational
    linguistics. They complicate the task of
    natural language processing because
    it‟s difficult to know when two names
    mean the same thing and it‟s difficult
    to know the sense of a name that has
    multiple meanings (doing so requires
    word-sense disambiguation).

                                           23
Research problem

   Synonymy :
    is the phenomenon where different
    words describe the same idea. Thus, a
    query in a search engine may fail to
    retrieve a relevant document that does
    not contain the words which appeared in
    the query. For example, a search for " "
    may not return a document containing
    the word "     ", even though the words
    have the same meaning.
                                           24
Research problem

   Polysemy :
    is the phenomenon where the same
    word has multiple meanings. So a
    search     may    retrieve  irrelevant
    documents containing the desired
    words in the wrong meaning. For
    example, a botanist and a computer
    scientist looking for the word "tree"
    probably desire different sets of
    documents.

                                         25
Semantic approches
   Automatic discovery of similar words :
    the underlying goal of this approach is
    in general the automatic discovery of
    synonyms. Most methods provide
    words that are “similar” to each other,
    with some vague notion of semantic
    similarity.



                                          26
Semantic approches
   Automatic discovery of similar words :
    among the existing methods we find :
    techniques that, upon input of a
    word, automatically compile a list of
    good       synonyms       or     near-
    synonyms, and techniques that
    generate a thesaurus (from some
    source, they built a complete lexicon
    of related words ).

                                         27
Semantic approches
   Automatic discovery of similar words :
    the basic assumption of most of these
    approaches is that words are similar if
    they are used in the same contexts.
    The methods differ in the way the
    contexts are defined and the way the
    similarity function is computed.



                                          28
Semantic approches
   Automatic discovery of similar words :
    the basic assumption of most of these
    approaches is that words are similar if
    they are used in the same contexts.
    The methods differ in the way the
    contexts are defined and the way the
    similarity function is computed.



                                          29
Semantic approches
   Term Selection :
    one approches of term selection problem
    is based on the co-occurrence of
    “similar” terms in “the same context”. We
    use the notion of term profile to calculate
    term quality and select the best quality
    index terms. The quality of a term t is
    based on distribution of terms “similar” to
    t and co-occurring in sentences across
    the document collection.

                                              30
Semantic approches
   Synonyms based search method:
    this search method is based on the
    synonyms of the words. Each word of
    the user query go to an arabic
    thesaurus and get the synonyms of
    each word. Each synonyms word of
    the user query is marched to the same
    word in the index table.


                                        31
References
 P. Senellart and V. D. Blondel, „Automatic
  discovery of similar words‟, Survey of text
  mining book,pp. 26-44. 2003.
 A. T. Al-Taani and A. M. Al-Gharaibeh,
  „Searching Concepts and Keywords in the
  holy Quran‟, Yarmou University, Jordan.
 I. Dhillon and J. Kogan and C. Nicholas,
  „Feature selection and document clustering‟,
  Survey of text mining book,pp. 73-100. 2003.
 ED Liddy, Natural language processing-
  Introduction. 2001.

                                                 32

More Related Content

What's hot

Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Hammad Afzal
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
ijnlc
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uop
world20000
 
Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
Aravind Mohanoor
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
iosrjce
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
ijcsit
 
W17 5406
W17 5406W17 5406
W17 5406
bonbon93
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
ijfcstjournal
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
ijnlc
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
Algoscale Technologies Inc.
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ijnlc
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
ijnlc
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
Editor IJARCET
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
ijnlc
 
L1803058388
L1803058388L1803058388
L1803058388
IOSR Journals
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
IJECEIAES
 
Cl35491494
Cl35491494Cl35491494
Cl35491494
IJERA Editor
 

What's hot (19)

Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uop
 
Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
W17 5406
W17 5406W17 5406
W17 5406
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
L1803058388
L1803058388L1803058388
L1803058388
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
Cl35491494
Cl35491494Cl35491494
Cl35491494
 

Viewers also liked

Marker Controlled Segmentation Technique for Medical application
Marker Controlled Segmentation Technique for Medical applicationMarker Controlled Segmentation Technique for Medical application
Marker Controlled Segmentation Technique for Medical application
Rushin Shah
 
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
tulipbiru64
 
K Search
K SearchK Search
K Search
mennatollah
 
Cebit2009new
Cebit2009newCebit2009new
Cebit2009new
Abdallah Aziz
 
E lex presentation_03
E lex presentation_03E lex presentation_03
E lex presentation_03
Mohammed Attia
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
Jake Mannix
 
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending  Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Assem CHELLI
 
Chap10
Chap10Chap10
Chap10
Terry Yoast
 
K Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language SoftwareK Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language Software
Abdallah Aziz
 
Statistika
StatistikaStatistika
Statistika
Muhamad Yogi
 
REA (Resources, Events, Agents)
REA (Resources, Events, Agents)REA (Resources, Events, Agents)
REA (Resources, Events, Agents)
Demetrius_Gallitzin
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
KU Leuven
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
treaty of hudabiya
treaty of hudabiyatreaty of hudabiya
treaty of hudabiya
Asif Sheikh
 
Treaty of Al Hudaybiyah
Treaty of Al HudaybiyahTreaty of Al Hudaybiyah
Treaty of Al Hudaybiyah
Faryal2000
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 

Viewers also liked (16)

Marker Controlled Segmentation Technique for Medical application
Marker Controlled Segmentation Technique for Medical applicationMarker Controlled Segmentation Technique for Medical application
Marker Controlled Segmentation Technique for Medical application
 
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
The Effectiveness Of Searching Arabic Resources Through OPAC : A Case Study I...
 
K Search
K SearchK Search
K Search
 
Cebit2009new
Cebit2009newCebit2009new
Cebit2009new
 
E lex presentation_03
E lex presentation_03E lex presentation_03
E lex presentation_03
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending  Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending
 
Chap10
Chap10Chap10
Chap10
 
K Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language SoftwareK Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language Software
 
Statistika
StatistikaStatistika
Statistika
 
REA (Resources, Events, Agents)
REA (Resources, Events, Agents)REA (Resources, Events, Agents)
REA (Resources, Events, Agents)
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
treaty of hudabiya
treaty of hudabiyatreaty of hudabiya
treaty of hudabiya
 
Treaty of Al Hudaybiyah
Treaty of Al HudaybiyahTreaty of Al Hudaybiyah
Treaty of Al Hudaybiyah
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 

Similar to Information retrieval based on word sens 1

Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
WORD RECOGNITION MASLP
WORD RECOGNITION MASLPWORD RECOGNITION MASLP
WORD RECOGNITION MASLP
HimaniBansal15
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Nlp
NlpNlp
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
Mohit Jasapara
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
ijaia
 
Nlp (1)
Nlp (1)Nlp (1)
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
IJRAT
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
Subramanian Mani
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
Linda Garcia
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
csandit
 
1. models of word recognition
1. models of word recognition1. models of word recognition
1. models of word recognition
Hemaraja Nayaka S
 
NLP todo
NLP todoNLP todo
NLP todo
Rohit Verma
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
ijaia
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Basha Chand
 
REPORT.doc
REPORT.docREPORT.doc
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
Harshit Yadav
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
acijjournal
 

Similar to Information retrieval based on word sens 1 (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
WORD RECOGNITION MASLP
WORD RECOGNITION MASLPWORD RECOGNITION MASLP
WORD RECOGNITION MASLP
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Nlp
NlpNlp
Nlp
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
1. models of word recognition
1. models of word recognition1. models of word recognition
1. models of word recognition
 
NLP todo
NLP todoNLP todo
NLP todo
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 

Recently uploaded

Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 

Recently uploaded (20)

Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 

Information retrieval based on word sens 1

  • 1. Information Retrieval Based On Word Sens Athman Hajhamou Computer and Modeling Laboratory – USMBA- FSDM – Fès 1
  • 2. Summary  Research domain  Characteristics of classical arabic  Morphological processing  Research problem  Semantic approches 2
  • 3. Research domain  Natural Language Processing (NLP) : is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. 3
  • 4. Research domain  Levels of Natural Language Processing : Phonology. Morphology. Lexical. Syntactic. Semantic. 4
  • 5. Research domain  Levels of Natural Language Processing :  Phonology : this level deals with the interpretation of speech sounds within and across words. In a NLP system that accept spoken input, the sound waves are analyzed and encoded into digitized signal for interpretation. 5
  • 6. Research domain  Levels of Natural Language Processing :  Morphology : this level deals with the componential nature of words, which are composed of morphemes – the smallest units of meaning. For example the word can be morphologically analyzed into three separate morphemes: the prefix , the root , and the suffix . NLP system can recognize the meaning conveyed by each morpheme in order to gain and represent meaning. 6
  • 7. Research domain  Levels of Natural Language Processing :  Lexical : At this level, the words that have only one possible sense or meaning can be replaced by a semantic representation of that meanings. The nature of the representation varies according to the semantic theory utilized in the NLP system. The lexical level may require a lexicon an the particular approach taken by NLP system will determine whether a lexicon will be utilized, as well as the nature and extent of the information that is encoded in the lexicon. 7
  • 8. Research domain  Levels of Natural Language Processing :  Syntactic : This level focuses on analyzing the words in a sentence and so as to uncover the grammatical structure of the sentence. The output of this level of processing is a representation of the sentence that reveals the structural dependency relationships between the words. Syntax conveys meaning in most languages because order and dependency contribute to meaning. 8
  • 9. Research domain  Levels of Natural Language Processing :  Semantic : This is the level at witch most people think mining is determined, however, as we can see in the above defining of the levels, it is all the levels that contribute to meaning. Semantic processing determines the possible meanings of a sentence by focusing on the interactions among word-level meanings in the sentences. This level of processing can include the semantic disambiguation of words with multiple senses. Semantic disambiguation permits one and only one sense of polysemous words to be selected. 9
  • 10. Research domain  Information Retrieval (IR): Can be defined as a study of how to determine and retrieve from a corpus of stored information the portion witch are relevant to particular information need. The information may be stored in a structured form or in a unstructured form, depending upon its applications 10
  • 11. Research domain  Information Retrieval (IR): A user of the store has to express his information need as a request for information in one form or another. Thus IR is concerned with the determining and retrieving of information that is relevant to his information need as expressed by his request and translated into a query witch conforms to a specific information retrieval system (IRS). An IRS normally stores surrogates of the actually documents in the system to represent the documents and the information stored in them. 11
  • 12. Characteristics of classical arabic  The Arabic Language raise several challenges to Natural Language Processing (NLP) largely due to its rich morphology. Morphological processing becomes particulary important for Information retrieval (IR), because IR needs to determine an appropriate form of words as index. 12
  • 13. Characteristics of classical arabic  The Arabic Language is a semantic language with a composite morphology. Arabic words are categorized as particles, nouns, or verbs. Unlike most western languages, Arabic script writing orientation is from right to left. There are 28 characters in Arabic. The characters are connected and do not start with capital letter. Most of the characters differ in shape based in their position in the sentence and adjunct letters. 13
  • 14. Morphological processing  Almost all information retrieval systems work in the same way and pass several steps before retrieve the most relevant documents in the field of some formulated queries. These steps deal with a set of documents and its text contents deal with representations of documents. 14
  • 15. Morphological processing  Pre-processing : document content is pre-processed before search process. Pre-processing can be divided into four text operations :  Lexical analysis of the text with the objective of treating digits, hyphens, punctuation marks. Elimination of the stop words. Remove diacritics. Normalization of the word. Stemming. Selection of index term. 15
  • 16. Morphological processing Pre-processing :  Lexical analysis of the text : the text of every text file is converted into a stream of words (the candidate words to be adopted as index). The following three case have to be considered with care : not Arabic word, punctuation marks, digits. 16
  • 17. Morphological processing  Pre-processing :  Elimination of the stop words : Stop words are words which are too frequent among text files which do not carry a particular and useful meaning for IR. Elimination of stop words reduces the size of the indexing structure. 17
  • 18. Morphological processing  Pre-processing :  Remove diacritics : short vowels and other diacritics are removed from every text file. Short vowels include the fatha, domma, and kasra. Others diacritics such as the shadda, sikkun, and tanween. 18
  • 19. Morphological processing  Pre-processing :  Normalization of the words: is the process of unification of different form of the same letter. 19
  • 20. Morphological processing  Pre-processing :  Stemming : stemming of the remaining words with objective of remaining affixes (prefixes and suffixes) and allowing the retrieval of documents containing syntactic variations of query terms. (Mountassire) 20
  • 21. Morphological processing  Pre-processing :  Selection of index term : Index term or Keyword a pre-selected term which can be used to refer to the content of a document. 21
  • 22. Morphological processing  Search method: is based on the root of the word, each word of the user query is go back to the previous phase (text files pre- processing) and do all pre-processing steps. Each root words of the user query is matched to the root word in the index table and retrieve documents or portions of documents that have the same root word. 22
  • 23. Research problem  Synonymy and polysemy are two important areas in linguistics that present a problem for computational linguistics. They complicate the task of natural language processing because it‟s difficult to know when two names mean the same thing and it‟s difficult to know the sense of a name that has multiple meanings (doing so requires word-sense disambiguation). 23
  • 24. Research problem  Synonymy : is the phenomenon where different words describe the same idea. Thus, a query in a search engine may fail to retrieve a relevant document that does not contain the words which appeared in the query. For example, a search for " " may not return a document containing the word " ", even though the words have the same meaning. 24
  • 25. Research problem  Polysemy : is the phenomenon where the same word has multiple meanings. So a search may retrieve irrelevant documents containing the desired words in the wrong meaning. For example, a botanist and a computer scientist looking for the word "tree" probably desire different sets of documents. 25
  • 26. Semantic approches  Automatic discovery of similar words : the underlying goal of this approach is in general the automatic discovery of synonyms. Most methods provide words that are “similar” to each other, with some vague notion of semantic similarity. 26
  • 27. Semantic approches  Automatic discovery of similar words : among the existing methods we find : techniques that, upon input of a word, automatically compile a list of good synonyms or near- synonyms, and techniques that generate a thesaurus (from some source, they built a complete lexicon of related words ). 27
  • 28. Semantic approches  Automatic discovery of similar words : the basic assumption of most of these approaches is that words are similar if they are used in the same contexts. The methods differ in the way the contexts are defined and the way the similarity function is computed. 28
  • 29. Semantic approches  Automatic discovery of similar words : the basic assumption of most of these approaches is that words are similar if they are used in the same contexts. The methods differ in the way the contexts are defined and the way the similarity function is computed. 29
  • 30. Semantic approches  Term Selection : one approches of term selection problem is based on the co-occurrence of “similar” terms in “the same context”. We use the notion of term profile to calculate term quality and select the best quality index terms. The quality of a term t is based on distribution of terms “similar” to t and co-occurring in sentences across the document collection. 30
  • 31. Semantic approches  Synonyms based search method: this search method is based on the synonyms of the words. Each word of the user query go to an arabic thesaurus and get the synonyms of each word. Each synonyms word of the user query is marched to the same word in the index table. 31
  • 32. References  P. Senellart and V. D. Blondel, „Automatic discovery of similar words‟, Survey of text mining book,pp. 26-44. 2003.  A. T. Al-Taani and A. M. Al-Gharaibeh, „Searching Concepts and Keywords in the holy Quran‟, Yarmou University, Jordan.  I. Dhillon and J. Kogan and C. Nicholas, „Feature selection and document clustering‟, Survey of text mining book,pp. 73-100. 2003.  ED Liddy, Natural language processing- Introduction. 2001. 32