SlideShare a Scribd company logo
‫الرحمي‬ ‫الرمحن‬ ‫هللا‬ ‫سم‬‫ب‬
*ٌ‫اب‬َ‫ت‬ِ‫ك‬ ‫الر‬‫ا‬ِ َ ِ‫َم‬ِ‫ح‬‫ا‬ُ‫ه‬ُ‫ت‬ َ‫َي‬ِ‫ص‬ُ‫ف‬ َّ ُ‫ُث‬ٍ ‫ر‬ََِِ ٍ ‫مي‬َِِ‫ح‬ ََُِْ ِ‫ن‬ِِ ِ َ‫َل‬*
1
T H E S I S O F M A G I S T E R
Proposal of an Advanced
Retrieval System for Noble Qur’an
Plan
 Introduction
 Problematic
 State of Art
 Search Engines
 Arabic Language
 Noble Quran
 Objectives
 Proposed search
features
 Conception
 Implemented work
 Published papers
 Conclusion &
Perspectives
_
Introduction
 Qur’an, in Arabic, means the Read or the Recitation.
Muslim scholars define it as:
« the words of Allah revealed to His Prophet Muhammad, written in
Mus’haf and transmitted by successive generations »
 Qur’an is a sacred book for all Muslims
 Qur’an is also the first reference to Islamic law.
 The Muslims, through 14 centuries, are still:
 Studying it,
 Teaching it,
 Writing books about it,
 Developing applications for it -recently-.
4
Problematic
 Qur’an is an important source of information about all
aspects of life:
 Scientific, Social, Historical, Political, Ethical, Juridical, etc.
 With a huge amount of information.
 Quran is extremely difficult for regular search tools to
successfully extract key information, so we should find
other ways to enquire!
 The appropriate solution for that is an Advanced
Retrieval System
 Why a Retrieval System?
 Why advanced?
5
Indexing
 Indexing consists in :
 Analyzing each document in the collection to create a set of
keywords.
 Creating a representation of documents in the system.
 Supporting other domains:
 Auto-Clustering of documents,
 Related keywords suggestion
 Documents Auto-Analysis,
 Calculating collocated terms,
 Auto-summarization.
 Etc.
6
Full-text search
 A technology of finding documents matching a set of words.
 Most of the web search engines such as Google and Bing!
use full-text search engines at the heart of their service
 The core of a full-text search engine is split into two main
operations:
 Indexing the information into an efficient format
 Searching the relevant information from this pre-computed index
7
Indexing :: Phases
Example: « Assem is >defending< his thesis!! »
 Tokenization: Assem + is + >defending< + his + thesis!!
 Normalization: assem + is + defending + his + thesis
 Filtering stop words : assem + $ + defending + $ + thesis
 Stemming: assem + $ + defend + $ + thesis
Resulted keywords: assem, defend, thesis
8
Indexing :: Index types
Querying (Search)
 Querying is the phase of interaction between the system and
the user.
 Search takes a user query and returns the effective list of
matching results sorted by relevance.
 Relevance: A degree of relationship between the document
and the query
10
Querying process
Semantic Approach
12
 Objective: improve search accuracy by
understanding searcher intent and the contextual
meaning of terms to generate more relevant results.
 Semantic search does not just mean contextual
search
 It is a smart search that would consider several
factors to provide the most relevant and useful
search queries.
Semantic Approach :: factors
13
 Current trend
 Location of search
 Intend of the search
 Variations of words
 Synonyms
 Generalized and Specialized queries
 Concept matching
 Natural language queries
 Change of meaning based on the group of words
13
Semantic Approach :: factors
14
 Current trend
 Who wins the Classico?  last one of course
 Location of search
 Weather temperature?  here in Algiers preferably
 Intend of the search
 Earth quake  Checking if one happened, or looking for articles
 Variations of words
 Man, Men, Man’s.
 Synonyms
 Biggest mountain , Highest mountain
 Generalized and Specialized queries
 Health vs Diabetes
 Concept matching
 Half life  the game or the physical constant
 Natural language queries
 What time is it in Cairo?
 Change of meaning based on the group of words
 New egg health benefits
 New egg health products
14
Arabic :: Orthography
 A Semitic language
 The language of Quran
 A Right-to-Left language
15
Arabic :: Lexicography
16
The classical Arabic grammar has only three subsets
 Verbs
 Verbs with a simple root (‫المجرد‬ ‫:)الفعل‬ َ‫ل‬َ‫ع‬َ‫ف‬
 Hamzated verb (‫,)مهموز‬ Assimilated verb (‫,)مثال‬ Hollow verb (‫,)أجوف‬
Weakened verb (‫,)ناقص‬ Geminated verb (‫ف‬َ‫ع‬‫.)مض‬
 Verbs with augmented root (‫المزيد‬ ‫)الفعل‬
 ‫ل‬ّ‫ع‬‫ف‬،‫فاعل‬،‫أفعل‬،‫ل‬ّ‫ع‬‫تف‬،‫تفاعل‬،‫افتعل‬،‫انفعل‬،‫استفعل‬
 Nouns
 Primitive nouns (‫الجامدة‬ ‫)األسماء‬ :
 Nouns derived from verbals (‫المشتقة‬ ‫)األسماء‬
 Numbers, Demonstrative pronouns, Relative pronouns, Personal
pronouns, Function words
 Particles
Arabic :: Morphology
• Arabic is a fusional language, considered as an intro-flexion
language:
• Consonants indicate the meaning
• Vowels mark the flexion
• Arabic language is very rich and based on the structure of
patterns (about 500) and roots (about 7000).
• Theoretically:
• A single Arabic root can generate hundreds of
words (noun, verb, ...) by applying patterns.
• A single Arabic word can exist in about a hundred
of forms by adding certain suffixes and prefixes
17
Arabic :: Flexional Morphology
18
• Arabic uses for the conjugation of verbs and declension
of nouns, some indications (Generally Affixes) of:
• aspect, mood, time, person, gender, number, case.
• These flexional marks can distinguish:
• Mode of verbs: Perfective, Imperfective …
• Function of nouns: Nominative, Accusative,
Genitive
Arabic :: Flexion
19
• Flexion of verbs (Conjugation)
o Aspect
o Mood
 Doubted, Affirmed (Actual or Eventual)
o Tense
 Perfective (‫:)الماضي‬ ‫فعلت‬ ،َ‫فعلت‬ ،ُ‫فعلت‬
 Imperfective (‫)المضارع‬
 Imperative (‫)األمر‬
Arabic :: Flexion :: Verbs
20
Arabic :: Flexion :: Verbs
21
• Perfective (‫:)الماضي‬
• 1st person: ،ُ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬‫َا‬‫ن‬ْ‫ل‬َ‫ع‬َ‫ف‬
• 2nd person: ْ‫ل‬َ‫ع‬َ‫ف‬ ،‫ا‬َ‫م‬ُ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،ِ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،َ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬َُّ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،،ُ‫ت‬
• 3rd person: ، ْ‫ت‬َ‫ل‬َ‫ع‬َ‫ف‬ ،َ‫ل‬َ‫ع‬َ‫ف‬َ‫ف‬ ،‫ا‬َ‫ت‬َ‫ل‬َ‫ع‬َ‫ف‬ ، َ‫َل‬َ‫ع‬َ‫ف‬ََّْ‫ل‬َ‫ع‬َ‫ف‬ ،‫وا‬ُ‫ل‬َ‫ع‬
• Imperfective (‫)المضارع‬
• Nominative,
• Accusative,
• Jussive,
• Imperative (‫)األمر‬
Arabic :: Flexion :: Nouns
22
• Flexion of nouns (declension)
o 3 cases:
 Nominative (‫)الرفع‬
 Accusative (‫)النصب‬
 Genitive (‫)الكسر‬
o Depends on:
 Number: Singular (‫,)المفرد‬ Dual (‫,)المثنى‬ Plural (‫)الجمع‬
 Form: Triptote , Diptote , etc.
-
Arabic :: Flexion :: Nouns
23
o Declension of Singular nouns
 Triptotes (‫المنصرفة‬ ‫:)األسماء‬ ‫ا‬ً‫ب‬‫كتا‬ ٍ‫ب‬‫كتا‬ ‫كتاب‬
 Diptotes (‫الصرف‬ ‫من‬ ‫الممنوعة‬ ‫:)األسماء‬ ‫قاحلة‬ ‫صحراء‬
 Five Nouns (‫الخمسة‬ ‫:)األسماء‬ ‫أخو‬‫أخا‬‫أخي‬
 Deverbals with defective roots : ‫ماض‬
o Declension of dual nouns: ‫كتابان‬ ‫كتابين‬
o Declension of plural nouns
 External masculine plural (‫السالم‬ ‫مذكر‬ ‫:)جمع‬ ‫كاتبين‬ ‫كاتبون‬
o Declension of function words
 Invariables : ‫منذ‬
 Variables: َّ‫ل‬‫ك‬
Arabic :: Derivational morphology
24
o Deverbal noun (‫:)المصدر‬ ‫د‬‫و‬ , ‫د‬‫و‬ , ‫اد‬‫د‬ِ‫,و‬ ‫ة‬‫اد‬‫د‬ِ‫و‬ , ‫َّة‬‫د‬‫و‬‫م‬
o Active participle (‫فاعل‬ ‫:)اسم‬ ‫ب‬ ِ‫ار‬‫ض‬ (hitter)
o Passive participle (‫مفعول‬ ‫:)اسم‬ ‫وب‬‫ْر‬‫ض‬‫م‬ (struck)
o Nouns of time and place (‫والمكان‬ ‫الزمان‬ ‫ٔسماء‬‫ا‬): ‫ة‬‫س‬‫ْر‬‫د‬‫م‬ (school), ‫ب‬ ِ‫ر‬ْ‫غ‬‫م‬
(sunset)
o The Nomen Vicis (‫المرة‬ ‫:)اسم‬ ‫ة‬‫ب‬ ْ‫ر‬‫ض‬ (a hit)
o The Nomen Speciei ( ‫الهيئة‬ ‫:)اسم‬ ‫ت‬‫س‬‫ل‬‫ج‬_‫ة‬‫س‬ْ‫ل‬ ِ‫ج‬_‫ا‬‫ل‬‫ا‬‫ات‬‫ير‬ِ‫م‬ (she sat like
princesses)
Arabic :: Ambiguities :: Absence of Vocalization
If text has the word (‫,)الملك‬
How should search engine understand the meaning?
Is it ?
1. « ‫ك‬‫ل‬‫|الم‬ Angel »,
2. « ‫لك‬‫|الم‬ Kingdom »
3. « ‫ك‬ِ‫ل‬‫|الم‬ King »
25
For the word « ‫»وعد‬ , the letter wâw « ‫»واو‬ is :
1. A part of the word:
‫د‬‫ع‬‫و‬ (to promise)
2. Not a part of the word:
‫و‬َّ‫د‬‫ع‬ (and + to count)
Arabic :: Ambiguities :: Prefixes
26
For the word « ‫,»وله‬ the letter ha’ (‫)هاء‬ is :
1. A part of the word:
‫ه‬ِ‫ل‬‫و‬ (admire)
2. Not a part of the word:
ِّ‫ل‬‫و‬ِ‫ه‬ (crown + him)
‫و‬‫ل‬‫ه‬ (and + he <-> has)
Arabic :: Ambiguities :: Suffixes
27
Quran :: Structure
 The Qur’an consists of 114 surahs, the surahs are divided into
ayahs.
 the main fragmentation, specified by the prophet.
28
‫القرآن‬
‫سورة‬1
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
‫سورة‬2
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
...
‫سورة‬114
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
Quran :: Structure
 There are many fragmentations:
 Primary structure: surah, ayah, word and letter;
 Special locations: First ayahs of Surah ( ‫السورة‬ ‫,)فواتح‬ Last ayahs of
Surah ( ‫السورة‬ ،‫,)خواتي‬ Qur’anic comma ( ‫فاصلة‬‫قرا‬‫نية‬ ), Sajdah ( ‫,)سجدة‬
Waqf (‫)وقف‬
 Other Structures: page, Juz’ (‫)جزء‬ , Hizb( ‫,)حزب‬ Nisf( ‫,)نصف‬ Rubu’(
‫,)ربع‬ Thumn( َّ‫)ثم‬
‫القرآن‬
‫أول‬ ‫جزء‬
‫حزب‬
‫نصف‬
‫ربع‬
َّ‫ثم‬َّ‫ثم‬
‫ربع‬
‫نصف‬
‫حزب‬
...
‫جزء‬
‫ثَلثون‬
29
‫القرآن‬
‫سورة‬1
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
‫سورة‬2
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
...
‫سورة‬114
•‫آية‬
•‫آية‬
•‫آية‬
•‫آية‬
•...
Quran :: Structure :: Stops (Waqfs)
3030
Quran :: Uthmani Script
standarduthmanipositionchanges
‫سأريكم‬‫سأوريكم‬(‫األعراف‬:145)
‫في‬ ‫الزيادة‬
‫الواو‬
‫العالمين‬‫العلمين‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫األلف‬ ‫حذف‬
‫الغاوون‬‫الغاون‬
(‫الشعراء‬:94)‫موضع‬ ‫و‬
‫آخر‬
‫الواو‬ ‫حذف‬
‫النبيين‬‫النبين‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫الياء‬ ‫حذف‬
‫الليل‬‫اليل‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫الالم‬ ‫حذف‬
‫ننجي‬‫نجي‬(‫األنبياء‬:88)‫النون‬ ‫حذف‬
‫وجيء‬‫وجائ‬(‫الزمر‬:69)‫آخر‬ ‫موضع‬ ‫و‬‫األلف‬ ‫زيادة‬
31
Quran :: Sciences
32
 Specific to Quran
 Tafssīr (‫)التفسير‬
 Knowledge of Makkan and Medinan ayahs
 Knowledge of the causes of revelation
 Knowledge of the beginnings of surahs
 Science of allegorical ayahs (‫المتشابه‬ ،‫)عل‬
 Qur’anic Parables ( ‫ا‬‫ل‬‫ا‬‫مثال‬‫القرا‬‫نية‬ )
32
Quran :: Sciences
33
 Shared with other resources
 Legislative Study:
 Fiqh ( ‫)الفقه‬
 Abrogating and Abrogated ayahs ( ‫والمنسوخ‬ ‫اسخ‬ّ‫ن‬‫)ال‬
 General and Particular (ّ‫م‬‫والعا‬ ّ‫)الخاص‬
 Lingustic Study:
 Orthography (ّ‫الخط‬ ‫مرسوم‬ ،‫)عل‬
 Grammatical analysis of the Qur’an ( ‫ا‬‫عراب‬‫ا‬‫ا‬‫لفاظ‬‫القرا‬‫ن‬ )
 Morphology ( ‫)الصرف‬
 Rhetoric ( ‫)البَلغة‬
 Lexicology ( ،‫المعاج‬ ،‫)عل‬
 Scientific Study
 Scientific Miracles in Quran
 Numerical study of verses (ignoring the debate about it)
Quran :: indexes
Syntactic
Semantic
StructuralStatistical
Thematic
The indexes are catigorized by purpose on 5 main categories:
Quran :: Indexes :: Projects
35
 Midād lbayān
 Word morphology index
 Zerrouki’s Indexes
 Word morphology index
 Topic index
 Synonym index
 Qur’anic Arabic Corpus
 Word_by_word morphology index
 Tanzil Project
 Ayah index (Electronic Mushaf)
 Sructural index
 Surah index
 Boundary-Annotated Qur’an Corpus
 Word_by_word Waqf index (+mapping Uthmani-Standard)
 Qurany Concepts Tool
 Concept index
Quran :: Ontologies + examples
36
 Qur’anic Concepts Ontology
 Henni’s Ontology
Quran :: Indexes/Ontologies projects
Global critics
37
 Not Available|Not Open
 Except Zerrouki’s , Quranic Arabic Corpus, Tanzil
 Discontinued Development
 Except Quranic Arabic Corpus, Tanzil
37
Quranic Search Tools
38
 Alawfa (‫)األوفى‬
 Al-Monaqeb-Alqurany ( ‫المنقب‬‫ني‬ٓ‫ا‬‫القر‬ )
 Quran complex search service
 Quranic Researcher ( ‫الباحث‬‫ني‬ٓ‫ا‬‫القر‬ )
 Quranologie ( ‫علم‬‫ن‬ٓ‫ا‬‫القر‬ )
 Quranic Corpus Word-by-Word Search
 Tanzil Quran Browser (‫)تنزيل‬
 Zekr (‫)ذكر‬
38
Quranic Search Tools :: Global Critics
39
 They are not Full-Text Search Engines
 except Tanzil’s and Zekr’s advanced Search.
 Basic Search Operations
 Simple Query System
 Weak or unsupported linguistic operations
 except Quranic Corpus word_by_word search
 No Semantic Approach
 Closed source
 except Zekr
 Implemented as Interfaces, not as APIs or Librairies.
39
Objectives
40
 Design a retrieval system that fits perfectly the
Qur’an search needs.
 Yet, first we should list and classify all the search features that
are possible and helpful.
 Then, we need to study how to implement each feature and
what is its requirements.
40
Proposed Search Features :: Advanced Query
 Fielded search
 ‫سورة‬:‫الفاتحة‬
 Logical relations
 ‫الصَلة‬‫و‬‫الزكاة‬
 Phrase search
 ”‫هلل‬ ‫الحمد‬“
 Interval search
 ،‫رق‬_‫اآلية‬:[1‫إلى‬5]
 Full Regular expression
 [ ‫ن‬‫ا‬ ]‫م‬ to search for َّ‫م‬ or ‫ما‬
 Wildcards (Jokers)
 ‫ب؟‬‫طة‬  ‫بسطة‬,‫بصطة‬
 *‫*نبي‬  ‫نبي‬,‫األنبياء‬ ، َّ‫النبيي‬,,,
41
Proposed Search Features :: Output Improvements
 Pagination
 Sorting
 Relevance
 Mushaf natural order
 Revelation order
 Numirical, Alphabitical, or Abjad order
 Keyword Highlight
‫ذرني‬‫خلقت‬ َّ‫وم‬<style>‫وحيدا‬</style>
42
Proposed Search Features :: Output Improvements (2)
 Real time output
 Results grouping
 by surahs
 by topics
 by taffssir dependency
 by revelation events
 by allegorical ayahs
 by parables
 Uthmani script with full diacritical marks
43
Proposed Search Features :: Suggestion System
 Spell corrections
 ‫ابراهام‬:،‫إبراهي‬
 Semantically related words
(Ontology-based)
 ‫يعقوب‬:‫نبي‬ ، ‫إسرائيل‬ ،‫إسحاق‬ ،‫يوسف‬ ...
44
Proposed Search Features :: Suggestion System (2)
 Different vocalizations
 ‫:الملك‬ ‫ك‬ِ‫ل‬‫الم‬،‫ك‬ْ‫ل‬‫الم‬،‫ك‬‫ل‬‫الم‬ ...
 Collocated words
 ‫:سميع‬ ‫سميع‬‫عليم‬،‫سميع‬‫بصير‬
 ‫:الحمد‬ ‫الحمد‬‫هلل‬
 Keyboard mapping
 fsl: ‫(بسم‬f  ‫,ب‬ s  ‫,س‬ l ‫)م‬
 Different significations
 ‫:رب‬ 1st meaning (god), 2nd meaning (master)
45
Proposed Search Features :: Linguistic aspects
 Romanization
 ‫خليفة‬ : kalīfaẗ (ISO233), xalyfap (Buckwalter), _halyfaT (Arabtex).
 Syntactic Coloration
 Partial vocalization search
 ‫م‬َ‫ـ‬‫لـك‬ to locate ‫م‬َ‫ـ‬‫ك‬ِ‫ـ‬‫ل‬ , ‫م‬َ‫ـ‬‫ك‬َ‫لـ‬ … and ignore ‫م‬ُ‫ـ‬‫ك‬ْ‫ـ‬‫ل‬
 Multi-level derivation
 (Word: ‫اسقينا‬‫ا‬, level: lemma) to find ‫ا‬ َ‫و‬ْ،ُ‫ك‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬,ْ،ُ‫ه‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬ َ‫ل‬,‫ا‬َ‫ف‬ُ‫م‬ُ‫ك‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬ُ‫ه‬‫و‬ .
 Specific-derivations
 Conjugaison in perfective of ‫قال‬ to find ‫قال,قالت‬ ,‫قالوا‬ ,َّ‫قل‬ ...
46
Proposed Search Features :: Linguistic aspects
 Vocal Search
 Word linguistic
annotation
….
47
Proposed Search Features :: Linguistic aspects
 Word properties embedded query
 { ‫جذر‬:‫ملك‬‫نوع‬:‫اسم‬‫عدد‬:‫مفرد‬ }
 Numerical values search
 309 replaced by ‫وتسعة‬ ‫ثَلثمائة‬
 Fuzzy string search
 ‫مءصدة‬ may replace ‫مؤصدة‬
 Linguistic examples search
 Rhetorical deletion (‫البَلغي‬ ‫)الحذف‬
 Grammatical Shift (‫اللتفات‬
 Uthnmani writing way
 ‫بسطة‬ may replace ‫بصطة‬
 ‫نعمت‬ may replace ‫نعمة‬
_
48
Proposed Search Features :: Quranic Options
 Recitation marks retrieving
 ‫سجدة‬:،‫نع‬
 Structural options
 ‫صفحة‬:1
 ‫جزء‬:،‫ع‬
 Divine Name Highlight
49
 Translation embedded query
 { text: mercy lang: english author: shekir }
 Repetitions and Allegorical ayahs ( ‫التكرار‬‫والمتشابهات‬ )
 Repetition {55,13} == [ ِّ‫ي‬َ‫أ‬ِ‫ب‬َ‫ف‬ِ‫ء‬ َ‫آل‬‫ا‬َ‫م‬ُ‫ك‬ِّ‫ب‬َ‫ر‬ِ‫ان‬َ‫ب‬ِّ‫ذ‬َ‫ك‬ُ‫ت‬ ], 31 repetitions
 Abrogators and Abrogated ayahs search ( ‫والمنسوخ‬ ‫)الناسخ‬
 Quranic parables (‫)األمثال‬
 parable ( ‫سورة‬:‫ا‬‫لبقرة‬ )
[ ‫م‬ُ‫ه‬ُ‫ل‬َ‫ث‬َ‫م‬‫ل‬َ‫ث‬َ‫م‬َ‫ك‬‫ي‬‫ذ‬َّ‫ل‬‫ا‬َ‫د‬َ‫ق‬‫و‬َ‫ت‬‫اس‬‫ا‬ً‫ار‬َ‫ن‬‫ا‬َّ‫م‬َ‫ل‬َ‫ف‬‫ت‬َ‫ء‬‫ا‬َ‫ض‬َ‫أ‬‫ا‬َ‫م‬ُ‫ه‬َ‫ل‬‫و‬َ‫ح‬َ‫ب‬َ‫ه‬َ‫ذ‬ُ‫ه‬‫ـ‬َّ‫ل‬‫ال‬‫م‬‫ه‬‫ور‬ُ‫ن‬‫ب‬‫م‬ُ‫ه‬َ‫ك‬َ‫ر‬َ‫ت‬َ‫و‬‫ي‬‫ف‬‫ات‬َ‫م‬ُ‫ل‬ُ‫ظ‬َ‫ل‬َ‫ون‬ُ‫ر‬‫ص‬‫ب‬ُ‫ي‬ ]
Proposed Search Features :: Quranic Options (2)
50
Proposed Search Features :: Semantic Queries
 Semantically related words
 Syn( ‫)جنة‬ to find ‫,جنة‬ ،‫نعي‬ ,‫فردوس‬ …
 Ant ( ‫)جنة‬ to find ،‫,جحي‬ ‫سعير‬ , ،‫جهن‬ , ‫سقر‬ …
 Is ( ‫)جنة‬ to find ‫فردوس‬ ،‫عدن‬
 … (based on ontology)
 Faceted Thematic Search
-
51
Proposed Search Features :: Semantic Queries
 Natural Questions: ‫ك،؟‬ ‫ل،؟‬ ‫متى؟‬ ‫أيَّ؟‬ ‫ما؟‬ ‫مَّ؟‬
‫ما‬‫هي‬‫الحطمة؟‬What is Al-hottamat?
[ُ‫ة‬َ‫د‬َ‫ق‬‫و‬ُ‫م‬‫ال‬ ‫ـه‬َّ‫ل‬‫ال‬ ُ‫ار‬َ‫ن‬]-‫الهمزة‬6
It is the fire kindled by Allah
َّ‫م‬،‫ه‬‫األنبياء؟‬Who are the prophets?
[َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬َ‫و‬ ‫ه‬‫د‬‫ع‬َ‫ب‬ ‫ن‬‫م‬ َ‫ين‬‫ي‬‫ب‬َّ‫ن‬‫ال‬َ‫و‬ ‫وح‬ُ‫ن‬ ‫ى‬َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬ ‫ا‬َ‫م‬َ‫ك‬ َ‫ك‬‫ي‬َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬ ‫ا‬َّ‫ن‬‫إ‬َ‫م‬‫س‬‫إ‬َ‫و‬ َ‫م‬‫ي‬‫اه‬َ‫ر‬‫ب‬‫إ‬ ‫ى‬‫ع‬َ‫و‬ ‫اط‬َ‫ب‬‫س‬َ‫اْل‬َ‫و‬ َ‫وب‬ُ‫ق‬‫ع‬َ‫ي‬َ‫و‬ َ‫َاق‬‫ح‬‫س‬‫إ‬َ‫و‬ َ‫ل‬‫ي‬‫اع‬َ‫س‬ُ‫ن‬‫و‬ُ‫ي‬َ‫و‬ َ‫وب‬ُّ‫ي‬َ‫أ‬َ‫و‬ ‫ى‬َ‫س‬‫ي‬
َ‫د‬‫و‬ُ‫َاو‬‫د‬ ‫ا‬َ‫ن‬‫ي‬َ‫ت‬‫آ‬َ‫و‬ َ‫ان‬َ‫م‬‫ي‬َ‫ل‬ُ‫س‬َ‫و‬ َ‫ون‬ُ‫ار‬َ‫ه‬َ‫و‬‫ا‬ً‫ور‬ُ‫ب‬َ‫ز‬]-‫النساء‬163
َّ‫أي‬‫غلبت‬/‫الروم؟‬ ‫هزمت‬Where was Rome defeated?
[َ‫ون‬ُ‫ب‬‫ل‬‫غ‬َ‫ي‬َ‫س‬ ‫م‬‫ه‬‫ب‬َ‫ل‬َ‫غ‬ ‫د‬‫ع‬َ‫ب‬ ‫ن‬‫م‬ ‫م‬ُ‫ه‬َ‫و‬ ‫ض‬‫ر‬َ‫اْل‬ ‫ى‬َ‫ن‬‫د‬َ‫أ‬ ‫ي‬‫ف‬]-‫الروم‬3
،‫ك‬‫الكهف؟‬ ‫أصحاب‬ ‫مكث‬How long did People of Cave stay?
[‫ا‬ً‫ع‬‫س‬‫ت‬ ‫ُوا‬‫د‬‫َا‬‫د‬‫از‬َ‫و‬ َ‫ين‬‫ن‬‫س‬ ‫َة‬‫ئ‬‫ا‬‫م‬ َ‫ث‬ َ‫َل‬َ‫ث‬ ‫م‬‫ه‬‫ف‬‫َه‬‫ك‬ ‫ي‬‫ف‬ ‫وا‬ُ‫ث‬‫ب‬َ‫ل‬َ‫و‬]-‫الكهف‬25
‫متى‬‫القيامة؟‬ ‫يوم‬When is the Day of Resurrection?
[‫ا‬َّ‫س‬‫ال‬ َّ‫ل‬َ‫ع‬َ‫ل‬ َ‫يك‬‫ر‬‫د‬ُ‫ي‬ ‫ا‬َ‫م‬َ‫و‬ ‫ـه‬َّ‫ل‬‫ال‬ َ‫د‬‫ن‬‫ع‬ ‫ا‬َ‫ه‬ُ‫م‬‫ل‬‫ع‬ ‫ا‬َ‫م‬َّ‫ن‬‫إ‬ ‫ل‬ُ‫ق‬ ‫َة‬‫ع‬‫ا‬َّ‫س‬‫ال‬ ‫َن‬‫ع‬ ُ‫اس‬َّ‫ن‬‫ال‬ َ‫ك‬ُ‫ل‬َ‫أ‬‫س‬َ‫ي‬‫ا‬ً‫ب‬‫ي‬‫ر‬َ‫ق‬ ُ‫ُون‬‫ك‬َ‫ت‬ َ‫َة‬‫ع‬]-‫الكهف‬25
‫كيف‬‫الجنيَّ؟‬ ‫ل‬ّ‫ك‬‫يتش‬How has the embryo be formed?
[َ‫ة‬َ‫ف‬‫ط‬ُّ‫ن‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬ َّ‫م‬ُ‫ث‬ً‫ة‬َ‫ق‬َ‫ل‬َ‫ع‬ً‫م‬‫ا‬َ‫ظ‬‫ع‬ َ‫ة‬َ‫غ‬‫ض‬ُ‫م‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬َ‫ف‬ ً‫ة‬َ‫غ‬‫ض‬ُ‫م‬ َ‫ة‬َ‫ق‬َ‫ل‬َ‫ع‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬َ‫ف‬َّ‫م‬ُ‫ث‬ ‫ا‬ً‫م‬‫ح‬َ‫ل‬ َ‫م‬‫ا‬َ‫ظ‬‫ع‬‫ال‬ ‫ا‬َ‫ن‬‫و‬َ‫س‬َ‫ك‬َ‫ف‬ ‫ا‬ُ‫ه‬‫ا‬َ‫ن‬‫َأ‬‫ش‬‫ن‬َ‫أ‬َ‫ار‬َ‫ب‬َ‫ت‬َ‫ف‬ َ‫ر‬َ‫خ‬‫آ‬ ‫ا‬ً‫ق‬‫ل‬َ‫خ‬َ‫ين‬‫ق‬‫ال‬َ‫خ‬‫ال‬ ُ‫ن‬َ‫س‬‫ح‬َ‫أ‬ ُ‫ه‬‫ـ‬َّ‫ل‬‫ال‬ َ‫ك‬]-
‫المؤمنون‬14
52
Proposed Search Features :: Semantic Queries (2)
 Auto Vocalisation
 ‫هللا‬ َّ‫م‬ ‫رسول‬ ِ‫هللا‬ ََِّ‫م‬ ‫ول‬ُ‫س‬ َ‫ر‬
 Entity extraction
 ‫تسعا‬ ‫وازدادوا‬ َّ‫سني‬ ‫مائة‬ ‫ثَلث‬ as (Time/number, 309)
 ‫ببكة‬ as (place, Mekka)
 ‫البصر‬ ‫كلمح‬ as (time unit, ??)
 ‫ذرة‬ ‫مثقال‬ as (size unit, ??)
 ‫يا‬‫ا‬‫ا‬‫يها‬‫النبي‬ as (person, Mohammad)
 Proper nouns search (co-reference resolution)
َّ‫بنيامي‬‫؟‬
[‫وا‬ُ‫ل‬‫ا‬َ‫ق‬ ‫ذ‬‫إ‬ُ‫ف‬ُ‫س‬‫و‬ُ‫ي‬َ‫ل‬َ‫و‬ُ‫ه‬‫و‬ُ‫خ‬َ‫أ‬‫ة‬َ‫ب‬‫ُص‬‫ع‬ ُ‫ن‬‫ح‬َ‫ن‬َ‫و‬ ‫ا‬َّ‫ن‬‫م‬ ‫ا‬َ‫ن‬‫ي‬‫ب‬َ‫أ‬ ‫ى‬َ‫ل‬‫إ‬ ُّ‫ب‬َ‫ح‬َ‫أ‬‫ين‬‫ب‬ُ‫م‬ ‫ل‬ َ‫َل‬َ‫ض‬ ‫ي‬‫ف‬َ‫ل‬ ‫ا‬َ‫ن‬‫ا‬َ‫ب‬َ‫أ‬ َّ‫ن‬‫إ‬]-‫المؤمنون‬14
--
53
Proposed Search Features :: Statistical system
 Frequencies of different units
 How many words of « ‫»هللا‬ in Surah “‫?”المجادلة‬
 What are the ten most frequently cited words in the whole Qur’an?
 How many the word of Sea/ ‫بحر‬ and its derivations are mentioned
in the whole Qur’an?
 How many letters in the Surah ‫?طه‬
 What’s the longest Ayah?
 How many Marks of Sajdah in the whole Qur’an? (different
rewayates)
54
Discussion of search features
55
 To validate Usefulness, Importance and Clarity of
each feature, we’ve launched a survey to gather the
opinions.
 We mixed the aimed audience to get high quality
feedbacks from :
 Regular users,
 Quran scholars,
 Arabic morphology experts,
 Natural Language Processing /Information Retrieval
researchers,
 philosophers , working on religious scriptures comparing.
55
Survey Takers
5656
Survey Takers
5757
Survey Results
5858
Conception
59
 Previous Work:
 the Engineer degree graduation project entitled “Development of a
search and indexing engine for Qur’anic documents” [Dahmani2010]
 Improvements:
 Moving into a Full vocalized search engine
 Customization of text processing phases, considering both uthmani and
standard scripts
 Adopting the Quranic word as a search unit
59
Conception :: Full Vocalized Search Engine
60
 Barriers:
 Comparing vocalized, partially vocalized, and unvocalized texts
 Distinguishing between original vowels and declension case
markers
 Lack of vocalized Arabic linguistic resources
 Texts, ontologies, thesauruses, corpuses
 Advantages:
 Lift the ambiguities caused by ignoring vocalizations
 Make searching results, suggestions, and statistics more accurate.
 Refine the meanings detection
 ( a first step in the semantic approach )
60
Conception :: Text processing
61
 We consider both standard script and uthmani script to
resolve difficulties such as:
 Searching with an Uthmani writing form of a word.
 Calculating statistics knowing based on the uthmani writing.
 Matching the same Word-By-Word structure of some Quranic
linguistic resources
61
Conception :: Text processing :: Global schema
6262
Conception :: Text processing :: Substitution
63
 New phase! Purpose?
 Cases of substitution:
 Romanization:
 Guessing policy:
 Nature of used characters
 Arabic valid words
 Word existence in Quran
 Predefined priorities
 Numbers as words:
 Rules:
 We don’t say ‫رجل‬ ‫,صفر‬ we say ‫رجل‬ ‫ل‬
 One never mentioned as ‫واحد‬ but as ‫احد‬‫ا‬
 Some numbers accept gender: ‫ا‬‫ثنان‬‫ا‬‫ثنتان‬
 Other numbers change their forms in the opposite gender of the count noun:
‫سماوات‬ ‫,سبع‬ ‫سبعة‬‫ا‬‫ا‬‫بحر‬
 A hundred ‫مئة‬ had a special writing in Quran: ُ‫ة‬َ‫ئ‬‫ا‬ِ‫م‬
 Some numbers mentioned indirectly: َ‫ف‬ْ‫ل‬‫ا‬_ٍ‫ة‬َ‫ن‬َ‫س‬_‫ِٕل‬‫ا‬_ََّ‫ِي‬‫س‬ْ‫َم‬‫خ‬_‫ا‬ً‫م‬‫ا‬َ‫ع‬
63
Conception :: Text processing :: Tokenization
64
 Phases:
 Phrases to words (tokens)
 Words to their parts (Sub-tokens)
64
Conception :: Text processing :: Tokenization
6565
Conception :: Text processing :: Normalization
66
 Normalize Uthmani text into Standard text
 Strip all recitation marks
 keep the vowels except the declension case ending vowel
66
Conception :: Text processing :: Filtering stop words
67
 Stop-words selection strategy:
 Chosen from the list of the most frequent words in
Qur’an,
 Considering vocalization
 Preferring:
 Particles such as َّ‫لك‬
 Pronouns such as َ‫أنت‬
 Clitics such as ‫ـ‬َ‫ف‬
67
Conception :: Text processing :: Stemming
68
 We proposed stripping the affixes in tokenization
 In Stemming, we bring the word back either to:
 ROOT: Large set of words, different meaning
 STEM: Smaller set of words, similar meaning
68
Conception :: Quranic Word as Search Unit
69
 Purpose: obtain a quick efficient stable method to retrieve specific
Quranic words.
 Requirements:
 A Quranic words corpus , enriched with linguistic annotations
 Word occurance as a unit
 Word form as a unit
 Information Schema:
 Identifiers: a global identifier, a secondary identifier based on the order in
the ayah added to ayah identifier and surah identifier;
 Different forms: Uthmani vocalized word (the main form), Standard
vocalized word, Standard unvocalized word;
 Transliterations: ISO233, Buckwalter, Arabtex;
 Translations: English, other languages;
 Different levels of stemming: Lemma, Stem, Root;
 Other properties: Part Of Speech, type, state, case, mood, voice, number,
gender, person.
69
Conception :: 2-steps search strategy
70
 1st step: retrieving the best keywords set based on the user
query by searching in:
 A word-as-a-unit index
 A Quranic words ontology
 2nd step: retrieving the corresponding ayahs using the
keywords set resulted from the first step
70
Conception :: 2-steps search :: applications
71
Conception :: Word Search :: Word properties
72
 Objective: allow the users to locate ayahs based on
linguistic properties of words such as POS, type,
state, case, mood, voice, number, gender, person.
 Methods:
 Fielded search:
 A fielded search is an advanced query feature that enables
users to select and associate the different document fields to
which he wishes to limit the query, then use the required
keywords within these fields.
72
Conception :: Word Search :: Semantically Related Words
73
 Objective: offer the related words of a keyword
entered by the user.
 Algorithm:
 The user specifies:
 The word
 The semantic relation: Synonymy, Antonymy, Hypernymy,
Hyponymy, Meronymy, Holonymy, Troponymy.
 Inquiring the ontology for related words
 Using those keywords to retrieve the corresponding
ayahs.
73
Conception :: Word Search :: Multi-level Derivations
74
 Objective: get a set of words that
share the same origin such as
stem and root.
 Algorithm:
 The user specify:
 the keyword
 The a level of derivation.
 Recovering the origin of the word in
the specified derivation level
 Retrieving all the set of words that
share this origin.
74
Conception :: Word Search :: Specific Derivations
75
 Objective: find the words resultants of
applying a specific derivation operation on
the user given word.
 Algorithm:
 The user should:
 Enter the keyword
 Specify which derivation.
 Generating the set of derived words either by:
 fetching in the word index
 using linguistic tools such as verb conjugators. be
filtered as a second step by intersection with the
set of Quranic words.
 The resulted set will be used to locate the
corresponding ayahs.*
75
Conception :: Word Search :: Fuzzy Search
76
 Objective: fetch using the set of words that are nearly similar to the
input word in writing or pronunciation.
 Methods:
 Liechtenstein distance (previously unknown text)
 Ngrams
 Spell-checker
 Soundtex (Phonetic )
76
Conception :: Word Search :: Fuzzy Search
77
 Arabic Similarities Specifications
 ‫مءصدة‬and ‫مؤصدة‬
 ‫الحمد‬and ‫د‬ْ‫م‬‫الح‬
 ِ‫عشر‬and ‫عشر‬
 ‫ه‬‫يضل‬and ‫يضلله‬
 Examples
 Mis-order of letters: ‫زنبجيل‬for ‫زنجبيل‬
 Phonetic similarity: ‫هرم‬for ‫إرم‬
 Spelling similarity: ‫الضحي‬for ‫الضحى‬
Open Source but WHY?
There are a number of advantages lead us to open source, the following
points examine the most important of these[Web-Oss-watch]:
 Collaborative bug-fixing & Fast security vulnerabilities detection
>Given enough eyeballs, all bugs are shallow<
-- an open source slogan
 Customization.
 Translation & Localization.
 Development discontinuation.
 Being part of a community.
 Low cost.
78
Used Technologies :: Python
Python is a powerful dynamic programming language, used widely.
 Features:
 powerful and fast
 plays well with others
 runs everywhere
 friendly and easy to learn
 Free Open
79
Used Technologies :: Whoosh API
 Whoosh is a full-text indexing and searching library
implemented in Python
 Features:
 Pure Pythonic API
 Fielded indexing and search
 Fast indexing and retrieval
 Powerful query language
 Useful for circumstances such as:
 Anywhere a pure-Python solution is desirable to avoid having to
build/compile native libraries
 As a research platform (Python is easier to read!)
 When the search features are more important to us than the raw speed.
80
Implementation :: Previous Code Base
81
 Implemented on [Chelli&Dahmani2010]
 Licensed under GPL*
 (Server applications issue)
 Based on Whoosh Indexing Library
 Offering Many Search Operations
 Results in HTML format
 Raw format
 Can be used in Python
 Requires to write wrappers for other languages
 A basic resource manager
 Has a missing piece
81
Implementation :: Our improvements
82
 The code base:
 has had 981 commits made
 representing 15,243 lines of code
 mostly written in Python
 with a well-commented source code.
 took an estimated 4 years of effort (COCOMO model)
Reference: Ohloh Website.
82
Implementation :: Our improvements
New Output System
83
 A New Output System:
 JSON-Based ==> Simpler & more extensible
 Centralized ==> Changes on one & only one place
 Extended & Extensible Results Structure
 Customizable Search Request using flags
 Including a Statistic Calculating Unit
 Offering Meta-Data for request
83
Implementation :: Our improvements
Multiple Search Units
84
 Translation-as-unit:
 Word-as-unit
84
Implementation :: Our improvements
Many new features
85
 Fuzzy Search Feature
 Retrieving the neighbors of each ayah
85
Implementation :: Our improvements
Many new features (2)
86
 Manipulating different Quranic Scripts
 More suggestion operations
 Showing the linguistic annotations
 Retrieving & Showing transliterated keywords (Buckwalter)
86
Implementation :: Our improvements
Resources Importing Manager
87
 Resources Importing Manager:
 Downloading original resources (Licensing issue)
 Parsing & Importing the data to our intermediate database
 Indexing the database
 Updating auto-generated data files
87
Implementation :: Our improvements
Packaging System
88
 Automating the API building
 Packaging into:
 Source Tarball
 Binary Tarball
 Python egg package
 Debian deb package
 Red-hat rpm package
 Windows Installer
 Mac OS (Perspective)
88
Implementation :: Our improvements
->More<-
89
 Coding Standardization
 Following Python Conventions (PEP8)
 Using Pylint (a source code bug and quality checker)
 Documentation Covering
 Enriching the code with Readme files
 New Console interface
89
Implementation :: Open Issues
90
 Implement the modularity for the Query Parser: This is
important to enable the extensibility feature and fix the problem of
mixing (the combination) the different operations made during
parsing.
 Restrict the anonymous requests to the API: restricting
requests protect the API from flooding either intended or not. This
can be done by:
 Limit the maximum of simultaneous requests globally and by IP.
 Implement an identification system that works with remote clients.
 Move to the last version of Whoosh library: Whoosh is
almost in the version 3.X in its stable release while we still using an
older version which is 0.3. The moving to the last version is very
recommended to benefit of the improvements made. Though, it will
not be an easy operation since our API is intertwined with the older
version. Especially for the Query Parser.
90
Implementation :: Interfaces
9191
Implementation :: Open Issues
92
 Complete the features implementation
 Enrich the linguistic resources
 Implement the modularity for the Query Parser
 Restrict the anonymous requests to the API
 Move to the main stream of Whoosh library
 Maintain compatibility between Python versions
 Cover with documentation
 Optimize code and performance
92
Implementation :: Open Issues
93
 Enriching the linguistic resources: the actual
used resources are poor comparing to what we really
need.
 Integrate Qurany project to enrich the actual faceted thematic
search.
 Integrate the boundary annotations to enable the retrieving of
boundaries in Quran.
 Propose a standard format for new linguistic and Quranic
resources.
 Textify the binary database to enable the possibility of logging
of changes and take the benefits of revision control systems
such as GIT.
93
Implementation :: Open Issues
94
 Complete the features implementation
Fielded search YES
Logical relations YES
Phrase search YES
Interval search YES
Full Regex NO
Wildcards PARTIALLY
Boosting keywords YES
Pagination YES
Scoring YES
Sorting YES
Keywords Highlight YES
Uthmani full marks YES
Real time output NO
Results grouping NO
Spell correction PARTIALLY
Related keywords PARTIALLY
Different vocalizations YES
Collocated words NO
Keyboard mapping NO
Different significations NO
94
Romanization PARTIALLY
Partial vocalization PARTIALLY
Multi-level derivation YES
Syntactic Coloration NO
Vocal Search NO
Specific-derivations NO
Linguistic annotations PARTIALLY
Fuzzy string PARTIALLY
Word properties PARTIALLY
Linguistic examples NO
Structural options YES
Translation search YES
Uthmani writing way NO
Recitation marks PARTIALLY
Divine Names Highlight NO
Repetitions&Allegoricals NO
Abrogators&Abrogated NO
Qur’anic Parables NO
Semantically related words PARTIALLY
Faceted Thematic Search PARTIALLY
Entity Extraction NO
Questions Answering (QA) NO
Automatic vocalization NO
Co-reference resolution NO
Vocalized word frequency YES
Unvocalized word frequency YES
Another Qur’anic units frequency NO
Root/Stem/Lemma frequency NO
Implementation :: Open Issues
95
 Move to Python 3.X: Python 2 is disappearing and sooner
or later it’ll be fully replaced. There are many tools offer some
automatic scripts to convert a code from 2 into 3. Though, the
big part often should be done manually.
 Cover with documentation: the documentation is so
important, it’s expensive but it encourages the community to
involve in the project. This can be done by:
 Enrich the readme files;
 Enrich the code with appropriate comments;
 Create a usage How-To and straighten it with many demos;
 The man page for the console interface.
 Optimize code and performance: proceed the fixing of
pylint code analysis warnings and use Profile to check the
performance of each search feature in order to improve it.
95
Implementation :: Interfaces
9696
Implementation :: interfaces :: API
97
Powerful Points:
1. Free Libre Open
1. A Python API
1. A founded base
1. Lot of features
Implementation :: interfaces :: API#Sample
9898
Implementation :: Interfaces :: JSON web service
9999
Implementation :: Interfaces :: Console
100100
Examples of use
 As a desktop application
 As a web interface
 www.alfanous.org
 As a smart phone app
 iPhone, iPad
 Windows phone
101
Examples of use :: Alfanous.org
102
 Remarkable Features:
 Localizable
 Awarded:
 As the best-in-technicality website
in Algeria Web Awards 2012
Examples of use :: Alfanous.org (Responsive)
103
 Remarkable Features:
 User experience
 Responsiveness
 Simplicity
 Awarded:
 chosen as the best website categorized
under the religious websites in Algeria Web
Awards 2013
103
Examples of use :: iPhone Application
104
 Developed by:
 iPhone-islam (objective-C)
 Remarkable Features:
 running on iPhone and iPad series
104
Examples of use :: Windows phone APP
105
 Developed by:
 Moumen bou Abdellah (C#)
 Remarkable Features:
 Running on windows phone
105
Examples of use :: Alfanous Desktop Interface
106
 Remarkable Features:
 Offline use
106
Conferences
1. An Arabic paper in NITS 2011 KSA:
 Title: An Application Programming Interface for indexing and
search in Noble Quran
 Authors: Assem Chelli, Merouane Dahmani, Amar Balla, Taha
Zerrouki.
2. An English paper in a pre-conference workshop in
LREC 2012 Turkey which is about ”LRE-Rel:
Language Resource and Evaluation for Religious
Texts”
 Title: Advanced Search in Quran: Classification and Proposition
of All Possible Features.
 Authors: Assem Chelli, Amar Balla, Taha Zerrouki.
107
Conclusion & Perspectives
108
 We went through the implementation of many search
features that we previously enlisted.
 Unfortunately, there are more improvements to be done
and many issues to be resolved. We left them as
perspectives:
 Achieving an accurate statistics gathering system;
 Implementation of a more adequate suggestion system;
 Clear the way toward a semantic search engine;
 Proceeding the full conception of all search features.
 Complete implementation of all open issues.
108
Contacts:
Email: assem.ch@gmail.com
Github: @assem-ch
Twitter: @assem_ch
Project Links:
Website: www.alfanous.org
User feedback: feedback.alfanous.org
Source-code: www.github.com/assem-ch/alfanous

More Related Content

Viewers also liked

K Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language SoftwareK Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language Software
Abdallah Aziz
 
REA (Resources, Events, Agents)
REA (Resources, Events, Agents)REA (Resources, Events, Agents)
REA (Resources, Events, Agents)
Demetrius_Gallitzin
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
treaty of hudabiya
treaty of hudabiyatreaty of hudabiya
treaty of hudabiya
Asif Sheikh
 
Treaty of Al Hudaybiyah
Treaty of Al HudaybiyahTreaty of Al Hudaybiyah
Treaty of Al HudaybiyahFaryal2000
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
Thesis in IT Online Grade Encoding and Inquiry System via SMS Technology
Thesis in IT Online Grade Encoding and Inquiry System via SMS TechnologyThesis in IT Online Grade Encoding and Inquiry System via SMS Technology
Thesis in IT Online Grade Encoding and Inquiry System via SMS Technology
BelLa Bhe
 

Viewers also liked (11)

Chap10
Chap10Chap10
Chap10
 
K Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language SoftwareK Search Al Khawarizmy Language Software
K Search Al Khawarizmy Language Software
 
Statistika
StatistikaStatistika
Statistika
 
REA (Resources, Events, Agents)
REA (Resources, Events, Agents)REA (Resources, Events, Agents)
REA (Resources, Events, Agents)
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
treaty of hudabiya
treaty of hudabiyatreaty of hudabiya
treaty of hudabiya
 
Treaty of Al Hudaybiyah
Treaty of Al HudaybiyahTreaty of Al Hudaybiyah
Treaty of Al Hudaybiyah
 
Thesis riza
Thesis rizaThesis riza
Thesis riza
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
My thesis proposal
My thesis proposalMy thesis proposal
My thesis proposal
 
Thesis in IT Online Grade Encoding and Inquiry System via SMS Technology
Thesis in IT Online Grade Encoding and Inquiry System via SMS TechnologyThesis in IT Online Grade Encoding and Inquiry System via SMS Technology
Thesis in IT Online Grade Encoding and Inquiry System via SMS Technology
 

Similar to Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending

Open-source Hebrew search
Open-source Hebrew searchOpen-source Hebrew search
Open-source Hebrew searchItamar
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
ijaia
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
ijaia
 
The miracle of number seven
The miracle of number sevenThe miracle of number seven
The miracle of number sevenXenia Y
 
Arabic word of quran
Arabic word of quranArabic word of quran
Arabic word of quran
Injamul Haque
 
Arabic level-0-class-1
Arabic level-0-class-1Arabic level-0-class-1
Arabic level-0-class-1
Mohammad Ali
 
Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using r
Alexandria University
 
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library SystemAdopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
paperpublications3
 
Quran vocabulary
Quran vocabularyQuran vocabulary
Quran vocabulary
Arabic For Urdu Speakers
 
EDRAK: Entity-centric Data Resource for Arabic Knowledge
EDRAK: Entity-centric Data Resource for Arabic KnowledgeEDRAK: Entity-centric Data Resource for Arabic Knowledge
EDRAK: Entity-centric Data Resource for Arabic Knowledge
Mohamed Gad-elrab
 
islamic mualim
islamic mualimislamic mualim
islamic mualim
islamic mualim2
 
Quranic words part01 preface - abdulazeez abdulraheem
Quranic words part01   preface - abdulazeez abdulraheemQuranic words part01   preface - abdulazeez abdulraheem
Quranic words part01 preface - abdulazeez abdulraheemShahedur
 
english presentation.pptx
english presentation.pptxenglish presentation.pptx
english presentation.pptx
MateenAmjed1
 
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)Dr Kashif Khan
 
Arabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnetArabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnet
IJDKP
 
копия How to teach vocabulary
копия How  to teach vocabularyкопия How  to teach vocabulary
копия How to teach vocabulary
Iryna Grusha
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
kevig
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2Arabic_NLP_ImamU2013
 

Similar to Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending (20)

Open-source Hebrew search
Open-source Hebrew searchOpen-source Hebrew search
Open-source Hebrew search
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
The miracle-of-number-seven
The miracle-of-number-sevenThe miracle-of-number-seven
The miracle-of-number-seven
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
The miracle of number seven
The miracle of number sevenThe miracle of number seven
The miracle of number seven
 
Arabic word of quran
Arabic word of quranArabic word of quran
Arabic word of quran
 
Arabic level-0-class-1
Arabic level-0-class-1Arabic level-0-class-1
Arabic level-0-class-1
 
Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using r
 
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library SystemAdopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
 
Quran vocabulary
Quran vocabularyQuran vocabulary
Quran vocabulary
 
EDRAK: Entity-centric Data Resource for Arabic Knowledge
EDRAK: Entity-centric Data Resource for Arabic KnowledgeEDRAK: Entity-centric Data Resource for Arabic Knowledge
EDRAK: Entity-centric Data Resource for Arabic Knowledge
 
islamic mualim
islamic mualimislamic mualim
islamic mualim
 
Quranic words part01 preface - abdulazeez abdulraheem
Quranic words part01   preface - abdulazeez abdulraheemQuranic words part01   preface - abdulazeez abdulraheem
Quranic words part01 preface - abdulazeez abdulraheem
 
english presentation.pptx
english presentation.pptxenglish presentation.pptx
english presentation.pptx
 
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)
SALAAH “صلاۃ” IS NOT CONTACT PRAYER (NAMAZ)
 
Arabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnetArabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnet
 
копия How to teach vocabulary
копия How  to teach vocabularyкопия How  to teach vocabulary
копия How to teach vocabulary
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 

More from Assem CHELLI

How to get in GSoC , DevFest Algiers 2018
How to get in GSoC , DevFest Algiers  2018How to get in GSoC , DevFest Algiers  2018
How to get in GSoC , DevFest Algiers 2018
Assem CHELLI
 
Dev environment for linux (Mainly KDE and python)
Dev environment for linux  (Mainly KDE and python)Dev environment for linux  (Mainly KDE and python)
Dev environment for linux (Mainly KDE and python)
Assem CHELLI
 
Python Workshop
Python  Workshop Python  Workshop
Python Workshop
Assem CHELLI
 
تجربتي مع المساهمة في المشاريع الحرة - اليوم الحر
تجربتي مع المساهمة  في المشاريع الحرة - اليوم الحر تجربتي مع المساهمة  في المشاريع الحرة - اليوم الحر
تجربتي مع المساهمة في المشاريع الحرة - اليوم الحر
Assem CHELLI
 
Global Schema for Alfanous Quran Search Engine
Global Schema for Alfanous Quran Search EngineGlobal Schema for Alfanous Quran Search Engine
Global Schema for Alfanous Quran Search Engine
Assem CHELLI
 
Proposal of an Advanced Retrieval System for Noble Qur’an
Proposal of an Advanced Retrieval System for Noble Qur’anProposal of an Advanced Retrieval System for Noble Qur’an
Proposal of an Advanced Retrieval System for Noble Qur’an
Assem CHELLI
 
Alfanous Quran Search Engine API
Alfanous Quran Search Engine APIAlfanous Quran Search Engine API
Alfanous Quran Search Engine API
Assem CHELLI
 
OpenSSH tricks
OpenSSH tricksOpenSSH tricks
OpenSSH tricks
Assem CHELLI
 

More from Assem CHELLI (8)

How to get in GSoC , DevFest Algiers 2018
How to get in GSoC , DevFest Algiers  2018How to get in GSoC , DevFest Algiers  2018
How to get in GSoC , DevFest Algiers 2018
 
Dev environment for linux (Mainly KDE and python)
Dev environment for linux  (Mainly KDE and python)Dev environment for linux  (Mainly KDE and python)
Dev environment for linux (Mainly KDE and python)
 
Python Workshop
Python  Workshop Python  Workshop
Python Workshop
 
تجربتي مع المساهمة في المشاريع الحرة - اليوم الحر
تجربتي مع المساهمة  في المشاريع الحرة - اليوم الحر تجربتي مع المساهمة  في المشاريع الحرة - اليوم الحر
تجربتي مع المساهمة في المشاريع الحرة - اليوم الحر
 
Global Schema for Alfanous Quran Search Engine
Global Schema for Alfanous Quran Search EngineGlobal Schema for Alfanous Quran Search Engine
Global Schema for Alfanous Quran Search Engine
 
Proposal of an Advanced Retrieval System for Noble Qur’an
Proposal of an Advanced Retrieval System for Noble Qur’anProposal of an Advanced Retrieval System for Noble Qur’an
Proposal of an Advanced Retrieval System for Noble Qur’an
 
Alfanous Quran Search Engine API
Alfanous Quran Search Engine APIAlfanous Quran Search Engine API
Alfanous Quran Search Engine API
 
OpenSSH tricks
OpenSSH tricksOpenSSH tricks
OpenSSH tricks
 

Recently uploaded

Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 

Recently uploaded (20)

Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 

Proposal of an Advanced Retrieval System for NobleQur'an - Thesis defending

  • 1. ‫الرحمي‬ ‫الرمحن‬ ‫هللا‬ ‫سم‬‫ب‬ *ٌ‫اب‬َ‫ت‬ِ‫ك‬ ‫الر‬‫ا‬ِ َ ِ‫َم‬ِ‫ح‬‫ا‬ُ‫ه‬ُ‫ت‬ َ‫َي‬ِ‫ص‬ُ‫ف‬ َّ ُ‫ُث‬ٍ ‫ر‬ََِِ ٍ ‫مي‬َِِ‫ح‬ ََُِْ ِ‫ن‬ِِ ِ َ‫َل‬* 1
  • 2. T H E S I S O F M A G I S T E R Proposal of an Advanced Retrieval System for Noble Qur’an
  • 3. Plan  Introduction  Problematic  State of Art  Search Engines  Arabic Language  Noble Quran  Objectives  Proposed search features  Conception  Implemented work  Published papers  Conclusion & Perspectives _
  • 4. Introduction  Qur’an, in Arabic, means the Read or the Recitation. Muslim scholars define it as: « the words of Allah revealed to His Prophet Muhammad, written in Mus’haf and transmitted by successive generations »  Qur’an is a sacred book for all Muslims  Qur’an is also the first reference to Islamic law.  The Muslims, through 14 centuries, are still:  Studying it,  Teaching it,  Writing books about it,  Developing applications for it -recently-. 4
  • 5. Problematic  Qur’an is an important source of information about all aspects of life:  Scientific, Social, Historical, Political, Ethical, Juridical, etc.  With a huge amount of information.  Quran is extremely difficult for regular search tools to successfully extract key information, so we should find other ways to enquire!  The appropriate solution for that is an Advanced Retrieval System  Why a Retrieval System?  Why advanced? 5
  • 6. Indexing  Indexing consists in :  Analyzing each document in the collection to create a set of keywords.  Creating a representation of documents in the system.  Supporting other domains:  Auto-Clustering of documents,  Related keywords suggestion  Documents Auto-Analysis,  Calculating collocated terms,  Auto-summarization.  Etc. 6
  • 7. Full-text search  A technology of finding documents matching a set of words.  Most of the web search engines such as Google and Bing! use full-text search engines at the heart of their service  The core of a full-text search engine is split into two main operations:  Indexing the information into an efficient format  Searching the relevant information from this pre-computed index 7
  • 8. Indexing :: Phases Example: « Assem is >defending< his thesis!! »  Tokenization: Assem + is + >defending< + his + thesis!!  Normalization: assem + is + defending + his + thesis  Filtering stop words : assem + $ + defending + $ + thesis  Stemming: assem + $ + defend + $ + thesis Resulted keywords: assem, defend, thesis 8
  • 10. Querying (Search)  Querying is the phase of interaction between the system and the user.  Search takes a user query and returns the effective list of matching results sorted by relevance.  Relevance: A degree of relationship between the document and the query 10
  • 12. Semantic Approach 12  Objective: improve search accuracy by understanding searcher intent and the contextual meaning of terms to generate more relevant results.  Semantic search does not just mean contextual search  It is a smart search that would consider several factors to provide the most relevant and useful search queries.
  • 13. Semantic Approach :: factors 13  Current trend  Location of search  Intend of the search  Variations of words  Synonyms  Generalized and Specialized queries  Concept matching  Natural language queries  Change of meaning based on the group of words 13
  • 14. Semantic Approach :: factors 14  Current trend  Who wins the Classico?  last one of course  Location of search  Weather temperature?  here in Algiers preferably  Intend of the search  Earth quake  Checking if one happened, or looking for articles  Variations of words  Man, Men, Man’s.  Synonyms  Biggest mountain , Highest mountain  Generalized and Specialized queries  Health vs Diabetes  Concept matching  Half life  the game or the physical constant  Natural language queries  What time is it in Cairo?  Change of meaning based on the group of words  New egg health benefits  New egg health products 14
  • 15. Arabic :: Orthography  A Semitic language  The language of Quran  A Right-to-Left language 15
  • 16. Arabic :: Lexicography 16 The classical Arabic grammar has only three subsets  Verbs  Verbs with a simple root (‫المجرد‬ ‫:)الفعل‬ َ‫ل‬َ‫ع‬َ‫ف‬  Hamzated verb (‫,)مهموز‬ Assimilated verb (‫,)مثال‬ Hollow verb (‫,)أجوف‬ Weakened verb (‫,)ناقص‬ Geminated verb (‫ف‬َ‫ع‬‫.)مض‬  Verbs with augmented root (‫المزيد‬ ‫)الفعل‬  ‫ل‬ّ‫ع‬‫ف‬،‫فاعل‬،‫أفعل‬،‫ل‬ّ‫ع‬‫تف‬،‫تفاعل‬،‫افتعل‬،‫انفعل‬،‫استفعل‬  Nouns  Primitive nouns (‫الجامدة‬ ‫)األسماء‬ :  Nouns derived from verbals (‫المشتقة‬ ‫)األسماء‬  Numbers, Demonstrative pronouns, Relative pronouns, Personal pronouns, Function words  Particles
  • 17. Arabic :: Morphology • Arabic is a fusional language, considered as an intro-flexion language: • Consonants indicate the meaning • Vowels mark the flexion • Arabic language is very rich and based on the structure of patterns (about 500) and roots (about 7000). • Theoretically: • A single Arabic root can generate hundreds of words (noun, verb, ...) by applying patterns. • A single Arabic word can exist in about a hundred of forms by adding certain suffixes and prefixes 17
  • 18. Arabic :: Flexional Morphology 18 • Arabic uses for the conjugation of verbs and declension of nouns, some indications (Generally Affixes) of: • aspect, mood, time, person, gender, number, case. • These flexional marks can distinguish: • Mode of verbs: Perfective, Imperfective … • Function of nouns: Nominative, Accusative, Genitive
  • 19. Arabic :: Flexion 19 • Flexion of verbs (Conjugation) o Aspect o Mood  Doubted, Affirmed (Actual or Eventual) o Tense  Perfective (‫:)الماضي‬ ‫فعلت‬ ،َ‫فعلت‬ ،ُ‫فعلت‬  Imperfective (‫)المضارع‬  Imperative (‫)األمر‬
  • 20. Arabic :: Flexion :: Verbs 20
  • 21. Arabic :: Flexion :: Verbs 21 • Perfective (‫:)الماضي‬ • 1st person: ،ُ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬‫َا‬‫ن‬ْ‫ل‬َ‫ع‬َ‫ف‬ • 2nd person: ْ‫ل‬َ‫ع‬َ‫ف‬ ،‫ا‬َ‫م‬ُ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،ِ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،َ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬َُّ‫ت‬ْ‫ل‬َ‫ع‬َ‫ف‬ ،،ُ‫ت‬ • 3rd person: ، ْ‫ت‬َ‫ل‬َ‫ع‬َ‫ف‬ ،َ‫ل‬َ‫ع‬َ‫ف‬َ‫ف‬ ،‫ا‬َ‫ت‬َ‫ل‬َ‫ع‬َ‫ف‬ ، َ‫َل‬َ‫ع‬َ‫ف‬ََّْ‫ل‬َ‫ع‬َ‫ف‬ ،‫وا‬ُ‫ل‬َ‫ع‬ • Imperfective (‫)المضارع‬ • Nominative, • Accusative, • Jussive, • Imperative (‫)األمر‬
  • 22. Arabic :: Flexion :: Nouns 22 • Flexion of nouns (declension) o 3 cases:  Nominative (‫)الرفع‬  Accusative (‫)النصب‬  Genitive (‫)الكسر‬ o Depends on:  Number: Singular (‫,)المفرد‬ Dual (‫,)المثنى‬ Plural (‫)الجمع‬  Form: Triptote , Diptote , etc. -
  • 23. Arabic :: Flexion :: Nouns 23 o Declension of Singular nouns  Triptotes (‫المنصرفة‬ ‫:)األسماء‬ ‫ا‬ً‫ب‬‫كتا‬ ٍ‫ب‬‫كتا‬ ‫كتاب‬  Diptotes (‫الصرف‬ ‫من‬ ‫الممنوعة‬ ‫:)األسماء‬ ‫قاحلة‬ ‫صحراء‬  Five Nouns (‫الخمسة‬ ‫:)األسماء‬ ‫أخو‬‫أخا‬‫أخي‬  Deverbals with defective roots : ‫ماض‬ o Declension of dual nouns: ‫كتابان‬ ‫كتابين‬ o Declension of plural nouns  External masculine plural (‫السالم‬ ‫مذكر‬ ‫:)جمع‬ ‫كاتبين‬ ‫كاتبون‬ o Declension of function words  Invariables : ‫منذ‬  Variables: َّ‫ل‬‫ك‬
  • 24. Arabic :: Derivational morphology 24 o Deverbal noun (‫:)المصدر‬ ‫د‬‫و‬ , ‫د‬‫و‬ , ‫اد‬‫د‬ِ‫,و‬ ‫ة‬‫اد‬‫د‬ِ‫و‬ , ‫َّة‬‫د‬‫و‬‫م‬ o Active participle (‫فاعل‬ ‫:)اسم‬ ‫ب‬ ِ‫ار‬‫ض‬ (hitter) o Passive participle (‫مفعول‬ ‫:)اسم‬ ‫وب‬‫ْر‬‫ض‬‫م‬ (struck) o Nouns of time and place (‫والمكان‬ ‫الزمان‬ ‫ٔسماء‬‫ا‬): ‫ة‬‫س‬‫ْر‬‫د‬‫م‬ (school), ‫ب‬ ِ‫ر‬ْ‫غ‬‫م‬ (sunset) o The Nomen Vicis (‫المرة‬ ‫:)اسم‬ ‫ة‬‫ب‬ ْ‫ر‬‫ض‬ (a hit) o The Nomen Speciei ( ‫الهيئة‬ ‫:)اسم‬ ‫ت‬‫س‬‫ل‬‫ج‬_‫ة‬‫س‬ْ‫ل‬ ِ‫ج‬_‫ا‬‫ل‬‫ا‬‫ات‬‫ير‬ِ‫م‬ (she sat like princesses)
  • 25. Arabic :: Ambiguities :: Absence of Vocalization If text has the word (‫,)الملك‬ How should search engine understand the meaning? Is it ? 1. « ‫ك‬‫ل‬‫|الم‬ Angel », 2. « ‫لك‬‫|الم‬ Kingdom » 3. « ‫ك‬ِ‫ل‬‫|الم‬ King » 25
  • 26. For the word « ‫»وعد‬ , the letter wâw « ‫»واو‬ is : 1. A part of the word: ‫د‬‫ع‬‫و‬ (to promise) 2. Not a part of the word: ‫و‬َّ‫د‬‫ع‬ (and + to count) Arabic :: Ambiguities :: Prefixes 26
  • 27. For the word « ‫,»وله‬ the letter ha’ (‫)هاء‬ is : 1. A part of the word: ‫ه‬ِ‫ل‬‫و‬ (admire) 2. Not a part of the word: ِّ‫ل‬‫و‬ِ‫ه‬ (crown + him) ‫و‬‫ل‬‫ه‬ (and + he <-> has) Arabic :: Ambiguities :: Suffixes 27
  • 28. Quran :: Structure  The Qur’an consists of 114 surahs, the surahs are divided into ayahs.  the main fragmentation, specified by the prophet. 28 ‫القرآن‬ ‫سورة‬1 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •... ‫سورة‬2 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •... ... ‫سورة‬114 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •...
  • 29. Quran :: Structure  There are many fragmentations:  Primary structure: surah, ayah, word and letter;  Special locations: First ayahs of Surah ( ‫السورة‬ ‫,)فواتح‬ Last ayahs of Surah ( ‫السورة‬ ،‫,)خواتي‬ Qur’anic comma ( ‫فاصلة‬‫قرا‬‫نية‬ ), Sajdah ( ‫,)سجدة‬ Waqf (‫)وقف‬  Other Structures: page, Juz’ (‫)جزء‬ , Hizb( ‫,)حزب‬ Nisf( ‫,)نصف‬ Rubu’( ‫,)ربع‬ Thumn( َّ‫)ثم‬ ‫القرآن‬ ‫أول‬ ‫جزء‬ ‫حزب‬ ‫نصف‬ ‫ربع‬ َّ‫ثم‬َّ‫ثم‬ ‫ربع‬ ‫نصف‬ ‫حزب‬ ... ‫جزء‬ ‫ثَلثون‬ 29 ‫القرآن‬ ‫سورة‬1 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •... ‫سورة‬2 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •... ... ‫سورة‬114 •‫آية‬ •‫آية‬ •‫آية‬ •‫آية‬ •...
  • 30. Quran :: Structure :: Stops (Waqfs) 3030
  • 31. Quran :: Uthmani Script standarduthmanipositionchanges ‫سأريكم‬‫سأوريكم‬(‫األعراف‬:145) ‫في‬ ‫الزيادة‬ ‫الواو‬ ‫العالمين‬‫العلمين‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫األلف‬ ‫حذف‬ ‫الغاوون‬‫الغاون‬ (‫الشعراء‬:94)‫موضع‬ ‫و‬ ‫آخر‬ ‫الواو‬ ‫حذف‬ ‫النبيين‬‫النبين‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫الياء‬ ‫حذف‬ ‫الليل‬‫اليل‬‫القرآن‬ ‫في‬ ‫مواضعها‬ ‫جميع‬‫الالم‬ ‫حذف‬ ‫ننجي‬‫نجي‬(‫األنبياء‬:88)‫النون‬ ‫حذف‬ ‫وجيء‬‫وجائ‬(‫الزمر‬:69)‫آخر‬ ‫موضع‬ ‫و‬‫األلف‬ ‫زيادة‬ 31
  • 32. Quran :: Sciences 32  Specific to Quran  Tafssīr (‫)التفسير‬  Knowledge of Makkan and Medinan ayahs  Knowledge of the causes of revelation  Knowledge of the beginnings of surahs  Science of allegorical ayahs (‫المتشابه‬ ،‫)عل‬  Qur’anic Parables ( ‫ا‬‫ل‬‫ا‬‫مثال‬‫القرا‬‫نية‬ ) 32
  • 33. Quran :: Sciences 33  Shared with other resources  Legislative Study:  Fiqh ( ‫)الفقه‬  Abrogating and Abrogated ayahs ( ‫والمنسوخ‬ ‫اسخ‬ّ‫ن‬‫)ال‬  General and Particular (ّ‫م‬‫والعا‬ ّ‫)الخاص‬  Lingustic Study:  Orthography (ّ‫الخط‬ ‫مرسوم‬ ،‫)عل‬  Grammatical analysis of the Qur’an ( ‫ا‬‫عراب‬‫ا‬‫ا‬‫لفاظ‬‫القرا‬‫ن‬ )  Morphology ( ‫)الصرف‬  Rhetoric ( ‫)البَلغة‬  Lexicology ( ،‫المعاج‬ ،‫)عل‬  Scientific Study  Scientific Miracles in Quran  Numerical study of verses (ignoring the debate about it)
  • 34. Quran :: indexes Syntactic Semantic StructuralStatistical Thematic The indexes are catigorized by purpose on 5 main categories:
  • 35. Quran :: Indexes :: Projects 35  Midād lbayān  Word morphology index  Zerrouki’s Indexes  Word morphology index  Topic index  Synonym index  Qur’anic Arabic Corpus  Word_by_word morphology index  Tanzil Project  Ayah index (Electronic Mushaf)  Sructural index  Surah index  Boundary-Annotated Qur’an Corpus  Word_by_word Waqf index (+mapping Uthmani-Standard)  Qurany Concepts Tool  Concept index
  • 36. Quran :: Ontologies + examples 36  Qur’anic Concepts Ontology  Henni’s Ontology
  • 37. Quran :: Indexes/Ontologies projects Global critics 37  Not Available|Not Open  Except Zerrouki’s , Quranic Arabic Corpus, Tanzil  Discontinued Development  Except Quranic Arabic Corpus, Tanzil 37
  • 38. Quranic Search Tools 38  Alawfa (‫)األوفى‬  Al-Monaqeb-Alqurany ( ‫المنقب‬‫ني‬ٓ‫ا‬‫القر‬ )  Quran complex search service  Quranic Researcher ( ‫الباحث‬‫ني‬ٓ‫ا‬‫القر‬ )  Quranologie ( ‫علم‬‫ن‬ٓ‫ا‬‫القر‬ )  Quranic Corpus Word-by-Word Search  Tanzil Quran Browser (‫)تنزيل‬  Zekr (‫)ذكر‬ 38
  • 39. Quranic Search Tools :: Global Critics 39  They are not Full-Text Search Engines  except Tanzil’s and Zekr’s advanced Search.  Basic Search Operations  Simple Query System  Weak or unsupported linguistic operations  except Quranic Corpus word_by_word search  No Semantic Approach  Closed source  except Zekr  Implemented as Interfaces, not as APIs or Librairies. 39
  • 40. Objectives 40  Design a retrieval system that fits perfectly the Qur’an search needs.  Yet, first we should list and classify all the search features that are possible and helpful.  Then, we need to study how to implement each feature and what is its requirements. 40
  • 41. Proposed Search Features :: Advanced Query  Fielded search  ‫سورة‬:‫الفاتحة‬  Logical relations  ‫الصَلة‬‫و‬‫الزكاة‬  Phrase search  ”‫هلل‬ ‫الحمد‬“  Interval search  ،‫رق‬_‫اآلية‬:[1‫إلى‬5]  Full Regular expression  [ ‫ن‬‫ا‬ ]‫م‬ to search for َّ‫م‬ or ‫ما‬  Wildcards (Jokers)  ‫ب؟‬‫طة‬  ‫بسطة‬,‫بصطة‬  *‫*نبي‬  ‫نبي‬,‫األنبياء‬ ، َّ‫النبيي‬,,, 41
  • 42. Proposed Search Features :: Output Improvements  Pagination  Sorting  Relevance  Mushaf natural order  Revelation order  Numirical, Alphabitical, or Abjad order  Keyword Highlight ‫ذرني‬‫خلقت‬ َّ‫وم‬<style>‫وحيدا‬</style> 42
  • 43. Proposed Search Features :: Output Improvements (2)  Real time output  Results grouping  by surahs  by topics  by taffssir dependency  by revelation events  by allegorical ayahs  by parables  Uthmani script with full diacritical marks 43
  • 44. Proposed Search Features :: Suggestion System  Spell corrections  ‫ابراهام‬:،‫إبراهي‬  Semantically related words (Ontology-based)  ‫يعقوب‬:‫نبي‬ ، ‫إسرائيل‬ ،‫إسحاق‬ ،‫يوسف‬ ... 44
  • 45. Proposed Search Features :: Suggestion System (2)  Different vocalizations  ‫:الملك‬ ‫ك‬ِ‫ل‬‫الم‬،‫ك‬ْ‫ل‬‫الم‬،‫ك‬‫ل‬‫الم‬ ...  Collocated words  ‫:سميع‬ ‫سميع‬‫عليم‬،‫سميع‬‫بصير‬  ‫:الحمد‬ ‫الحمد‬‫هلل‬  Keyboard mapping  fsl: ‫(بسم‬f  ‫,ب‬ s  ‫,س‬ l ‫)م‬  Different significations  ‫:رب‬ 1st meaning (god), 2nd meaning (master) 45
  • 46. Proposed Search Features :: Linguistic aspects  Romanization  ‫خليفة‬ : kalīfaẗ (ISO233), xalyfap (Buckwalter), _halyfaT (Arabtex).  Syntactic Coloration  Partial vocalization search  ‫م‬َ‫ـ‬‫لـك‬ to locate ‫م‬َ‫ـ‬‫ك‬ِ‫ـ‬‫ل‬ , ‫م‬َ‫ـ‬‫ك‬َ‫لـ‬ … and ignore ‫م‬ُ‫ـ‬‫ك‬ْ‫ـ‬‫ل‬  Multi-level derivation  (Word: ‫اسقينا‬‫ا‬, level: lemma) to find ‫ا‬ َ‫و‬ْ،ُ‫ك‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬,ْ،ُ‫ه‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬ َ‫ل‬,‫ا‬َ‫ف‬ُ‫م‬ُ‫ك‬‫َا‬‫ن‬ْ‫ي‬َ‫ق‬ْ‫س‬ُ‫ه‬‫و‬ .  Specific-derivations  Conjugaison in perfective of ‫قال‬ to find ‫قال,قالت‬ ,‫قالوا‬ ,َّ‫قل‬ ... 46
  • 47. Proposed Search Features :: Linguistic aspects  Vocal Search  Word linguistic annotation …. 47
  • 48. Proposed Search Features :: Linguistic aspects  Word properties embedded query  { ‫جذر‬:‫ملك‬‫نوع‬:‫اسم‬‫عدد‬:‫مفرد‬ }  Numerical values search  309 replaced by ‫وتسعة‬ ‫ثَلثمائة‬  Fuzzy string search  ‫مءصدة‬ may replace ‫مؤصدة‬  Linguistic examples search  Rhetorical deletion (‫البَلغي‬ ‫)الحذف‬  Grammatical Shift (‫اللتفات‬  Uthnmani writing way  ‫بسطة‬ may replace ‫بصطة‬  ‫نعمت‬ may replace ‫نعمة‬ _ 48
  • 49. Proposed Search Features :: Quranic Options  Recitation marks retrieving  ‫سجدة‬:،‫نع‬  Structural options  ‫صفحة‬:1  ‫جزء‬:،‫ع‬  Divine Name Highlight 49
  • 50.  Translation embedded query  { text: mercy lang: english author: shekir }  Repetitions and Allegorical ayahs ( ‫التكرار‬‫والمتشابهات‬ )  Repetition {55,13} == [ ِّ‫ي‬َ‫أ‬ِ‫ب‬َ‫ف‬ِ‫ء‬ َ‫آل‬‫ا‬َ‫م‬ُ‫ك‬ِّ‫ب‬َ‫ر‬ِ‫ان‬َ‫ب‬ِّ‫ذ‬َ‫ك‬ُ‫ت‬ ], 31 repetitions  Abrogators and Abrogated ayahs search ( ‫والمنسوخ‬ ‫)الناسخ‬  Quranic parables (‫)األمثال‬  parable ( ‫سورة‬:‫ا‬‫لبقرة‬ ) [ ‫م‬ُ‫ه‬ُ‫ل‬َ‫ث‬َ‫م‬‫ل‬َ‫ث‬َ‫م‬َ‫ك‬‫ي‬‫ذ‬َّ‫ل‬‫ا‬َ‫د‬َ‫ق‬‫و‬َ‫ت‬‫اس‬‫ا‬ً‫ار‬َ‫ن‬‫ا‬َّ‫م‬َ‫ل‬َ‫ف‬‫ت‬َ‫ء‬‫ا‬َ‫ض‬َ‫أ‬‫ا‬َ‫م‬ُ‫ه‬َ‫ل‬‫و‬َ‫ح‬َ‫ب‬َ‫ه‬َ‫ذ‬ُ‫ه‬‫ـ‬َّ‫ل‬‫ال‬‫م‬‫ه‬‫ور‬ُ‫ن‬‫ب‬‫م‬ُ‫ه‬َ‫ك‬َ‫ر‬َ‫ت‬َ‫و‬‫ي‬‫ف‬‫ات‬َ‫م‬ُ‫ل‬ُ‫ظ‬َ‫ل‬َ‫ون‬ُ‫ر‬‫ص‬‫ب‬ُ‫ي‬ ] Proposed Search Features :: Quranic Options (2) 50
  • 51. Proposed Search Features :: Semantic Queries  Semantically related words  Syn( ‫)جنة‬ to find ‫,جنة‬ ،‫نعي‬ ,‫فردوس‬ …  Ant ( ‫)جنة‬ to find ،‫,جحي‬ ‫سعير‬ , ،‫جهن‬ , ‫سقر‬ …  Is ( ‫)جنة‬ to find ‫فردوس‬ ،‫عدن‬  … (based on ontology)  Faceted Thematic Search - 51
  • 52. Proposed Search Features :: Semantic Queries  Natural Questions: ‫ك،؟‬ ‫ل،؟‬ ‫متى؟‬ ‫أيَّ؟‬ ‫ما؟‬ ‫مَّ؟‬ ‫ما‬‫هي‬‫الحطمة؟‬What is Al-hottamat? [ُ‫ة‬َ‫د‬َ‫ق‬‫و‬ُ‫م‬‫ال‬ ‫ـه‬َّ‫ل‬‫ال‬ ُ‫ار‬َ‫ن‬]-‫الهمزة‬6 It is the fire kindled by Allah َّ‫م‬،‫ه‬‫األنبياء؟‬Who are the prophets? [َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬َ‫و‬ ‫ه‬‫د‬‫ع‬َ‫ب‬ ‫ن‬‫م‬ َ‫ين‬‫ي‬‫ب‬َّ‫ن‬‫ال‬َ‫و‬ ‫وح‬ُ‫ن‬ ‫ى‬َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬ ‫ا‬َ‫م‬َ‫ك‬ َ‫ك‬‫ي‬َ‫ل‬‫إ‬ ‫ا‬َ‫ن‬‫ي‬َ‫ح‬‫و‬َ‫أ‬ ‫ا‬َّ‫ن‬‫إ‬َ‫م‬‫س‬‫إ‬َ‫و‬ َ‫م‬‫ي‬‫اه‬َ‫ر‬‫ب‬‫إ‬ ‫ى‬‫ع‬َ‫و‬ ‫اط‬َ‫ب‬‫س‬َ‫اْل‬َ‫و‬ َ‫وب‬ُ‫ق‬‫ع‬َ‫ي‬َ‫و‬ َ‫َاق‬‫ح‬‫س‬‫إ‬َ‫و‬ َ‫ل‬‫ي‬‫اع‬َ‫س‬ُ‫ن‬‫و‬ُ‫ي‬َ‫و‬ َ‫وب‬ُّ‫ي‬َ‫أ‬َ‫و‬ ‫ى‬َ‫س‬‫ي‬ َ‫د‬‫و‬ُ‫َاو‬‫د‬ ‫ا‬َ‫ن‬‫ي‬َ‫ت‬‫آ‬َ‫و‬ َ‫ان‬َ‫م‬‫ي‬َ‫ل‬ُ‫س‬َ‫و‬ َ‫ون‬ُ‫ار‬َ‫ه‬َ‫و‬‫ا‬ً‫ور‬ُ‫ب‬َ‫ز‬]-‫النساء‬163 َّ‫أي‬‫غلبت‬/‫الروم؟‬ ‫هزمت‬Where was Rome defeated? [َ‫ون‬ُ‫ب‬‫ل‬‫غ‬َ‫ي‬َ‫س‬ ‫م‬‫ه‬‫ب‬َ‫ل‬َ‫غ‬ ‫د‬‫ع‬َ‫ب‬ ‫ن‬‫م‬ ‫م‬ُ‫ه‬َ‫و‬ ‫ض‬‫ر‬َ‫اْل‬ ‫ى‬َ‫ن‬‫د‬َ‫أ‬ ‫ي‬‫ف‬]-‫الروم‬3 ،‫ك‬‫الكهف؟‬ ‫أصحاب‬ ‫مكث‬How long did People of Cave stay? [‫ا‬ً‫ع‬‫س‬‫ت‬ ‫ُوا‬‫د‬‫َا‬‫د‬‫از‬َ‫و‬ َ‫ين‬‫ن‬‫س‬ ‫َة‬‫ئ‬‫ا‬‫م‬ َ‫ث‬ َ‫َل‬َ‫ث‬ ‫م‬‫ه‬‫ف‬‫َه‬‫ك‬ ‫ي‬‫ف‬ ‫وا‬ُ‫ث‬‫ب‬َ‫ل‬َ‫و‬]-‫الكهف‬25 ‫متى‬‫القيامة؟‬ ‫يوم‬When is the Day of Resurrection? [‫ا‬َّ‫س‬‫ال‬ َّ‫ل‬َ‫ع‬َ‫ل‬ َ‫يك‬‫ر‬‫د‬ُ‫ي‬ ‫ا‬َ‫م‬َ‫و‬ ‫ـه‬َّ‫ل‬‫ال‬ َ‫د‬‫ن‬‫ع‬ ‫ا‬َ‫ه‬ُ‫م‬‫ل‬‫ع‬ ‫ا‬َ‫م‬َّ‫ن‬‫إ‬ ‫ل‬ُ‫ق‬ ‫َة‬‫ع‬‫ا‬َّ‫س‬‫ال‬ ‫َن‬‫ع‬ ُ‫اس‬َّ‫ن‬‫ال‬ َ‫ك‬ُ‫ل‬َ‫أ‬‫س‬َ‫ي‬‫ا‬ً‫ب‬‫ي‬‫ر‬َ‫ق‬ ُ‫ُون‬‫ك‬َ‫ت‬ َ‫َة‬‫ع‬]-‫الكهف‬25 ‫كيف‬‫الجنيَّ؟‬ ‫ل‬ّ‫ك‬‫يتش‬How has the embryo be formed? [َ‫ة‬َ‫ف‬‫ط‬ُّ‫ن‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬ َّ‫م‬ُ‫ث‬ً‫ة‬َ‫ق‬َ‫ل‬َ‫ع‬ً‫م‬‫ا‬َ‫ظ‬‫ع‬ َ‫ة‬َ‫غ‬‫ض‬ُ‫م‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬َ‫ف‬ ً‫ة‬َ‫غ‬‫ض‬ُ‫م‬ َ‫ة‬َ‫ق‬َ‫ل‬َ‫ع‬‫ال‬ ‫ا‬َ‫ن‬‫ق‬َ‫ل‬َ‫خ‬َ‫ف‬َّ‫م‬ُ‫ث‬ ‫ا‬ً‫م‬‫ح‬َ‫ل‬ َ‫م‬‫ا‬َ‫ظ‬‫ع‬‫ال‬ ‫ا‬َ‫ن‬‫و‬َ‫س‬َ‫ك‬َ‫ف‬ ‫ا‬ُ‫ه‬‫ا‬َ‫ن‬‫َأ‬‫ش‬‫ن‬َ‫أ‬َ‫ار‬َ‫ب‬َ‫ت‬َ‫ف‬ َ‫ر‬َ‫خ‬‫آ‬ ‫ا‬ً‫ق‬‫ل‬َ‫خ‬َ‫ين‬‫ق‬‫ال‬َ‫خ‬‫ال‬ ُ‫ن‬َ‫س‬‫ح‬َ‫أ‬ ُ‫ه‬‫ـ‬َّ‫ل‬‫ال‬ َ‫ك‬]- ‫المؤمنون‬14 52
  • 53. Proposed Search Features :: Semantic Queries (2)  Auto Vocalisation  ‫هللا‬ َّ‫م‬ ‫رسول‬ ِ‫هللا‬ ََِّ‫م‬ ‫ول‬ُ‫س‬ َ‫ر‬  Entity extraction  ‫تسعا‬ ‫وازدادوا‬ َّ‫سني‬ ‫مائة‬ ‫ثَلث‬ as (Time/number, 309)  ‫ببكة‬ as (place, Mekka)  ‫البصر‬ ‫كلمح‬ as (time unit, ??)  ‫ذرة‬ ‫مثقال‬ as (size unit, ??)  ‫يا‬‫ا‬‫ا‬‫يها‬‫النبي‬ as (person, Mohammad)  Proper nouns search (co-reference resolution) َّ‫بنيامي‬‫؟‬ [‫وا‬ُ‫ل‬‫ا‬َ‫ق‬ ‫ذ‬‫إ‬ُ‫ف‬ُ‫س‬‫و‬ُ‫ي‬َ‫ل‬َ‫و‬ُ‫ه‬‫و‬ُ‫خ‬َ‫أ‬‫ة‬َ‫ب‬‫ُص‬‫ع‬ ُ‫ن‬‫ح‬َ‫ن‬َ‫و‬ ‫ا‬َّ‫ن‬‫م‬ ‫ا‬َ‫ن‬‫ي‬‫ب‬َ‫أ‬ ‫ى‬َ‫ل‬‫إ‬ ُّ‫ب‬َ‫ح‬َ‫أ‬‫ين‬‫ب‬ُ‫م‬ ‫ل‬ َ‫َل‬َ‫ض‬ ‫ي‬‫ف‬َ‫ل‬ ‫ا‬َ‫ن‬‫ا‬َ‫ب‬َ‫أ‬ َّ‫ن‬‫إ‬]-‫المؤمنون‬14 -- 53
  • 54. Proposed Search Features :: Statistical system  Frequencies of different units  How many words of « ‫»هللا‬ in Surah “‫?”المجادلة‬  What are the ten most frequently cited words in the whole Qur’an?  How many the word of Sea/ ‫بحر‬ and its derivations are mentioned in the whole Qur’an?  How many letters in the Surah ‫?طه‬  What’s the longest Ayah?  How many Marks of Sajdah in the whole Qur’an? (different rewayates) 54
  • 55. Discussion of search features 55  To validate Usefulness, Importance and Clarity of each feature, we’ve launched a survey to gather the opinions.  We mixed the aimed audience to get high quality feedbacks from :  Regular users,  Quran scholars,  Arabic morphology experts,  Natural Language Processing /Information Retrieval researchers,  philosophers , working on religious scriptures comparing. 55
  • 59. Conception 59  Previous Work:  the Engineer degree graduation project entitled “Development of a search and indexing engine for Qur’anic documents” [Dahmani2010]  Improvements:  Moving into a Full vocalized search engine  Customization of text processing phases, considering both uthmani and standard scripts  Adopting the Quranic word as a search unit 59
  • 60. Conception :: Full Vocalized Search Engine 60  Barriers:  Comparing vocalized, partially vocalized, and unvocalized texts  Distinguishing between original vowels and declension case markers  Lack of vocalized Arabic linguistic resources  Texts, ontologies, thesauruses, corpuses  Advantages:  Lift the ambiguities caused by ignoring vocalizations  Make searching results, suggestions, and statistics more accurate.  Refine the meanings detection  ( a first step in the semantic approach ) 60
  • 61. Conception :: Text processing 61  We consider both standard script and uthmani script to resolve difficulties such as:  Searching with an Uthmani writing form of a word.  Calculating statistics knowing based on the uthmani writing.  Matching the same Word-By-Word structure of some Quranic linguistic resources 61
  • 62. Conception :: Text processing :: Global schema 6262
  • 63. Conception :: Text processing :: Substitution 63  New phase! Purpose?  Cases of substitution:  Romanization:  Guessing policy:  Nature of used characters  Arabic valid words  Word existence in Quran  Predefined priorities  Numbers as words:  Rules:  We don’t say ‫رجل‬ ‫,صفر‬ we say ‫رجل‬ ‫ل‬  One never mentioned as ‫واحد‬ but as ‫احد‬‫ا‬  Some numbers accept gender: ‫ا‬‫ثنان‬‫ا‬‫ثنتان‬  Other numbers change their forms in the opposite gender of the count noun: ‫سماوات‬ ‫,سبع‬ ‫سبعة‬‫ا‬‫ا‬‫بحر‬  A hundred ‫مئة‬ had a special writing in Quran: ُ‫ة‬َ‫ئ‬‫ا‬ِ‫م‬  Some numbers mentioned indirectly: َ‫ف‬ْ‫ل‬‫ا‬_ٍ‫ة‬َ‫ن‬َ‫س‬_‫ِٕل‬‫ا‬_ََّ‫ِي‬‫س‬ْ‫َم‬‫خ‬_‫ا‬ً‫م‬‫ا‬َ‫ع‬ 63
  • 64. Conception :: Text processing :: Tokenization 64  Phases:  Phrases to words (tokens)  Words to their parts (Sub-tokens) 64
  • 65. Conception :: Text processing :: Tokenization 6565
  • 66. Conception :: Text processing :: Normalization 66  Normalize Uthmani text into Standard text  Strip all recitation marks  keep the vowels except the declension case ending vowel 66
  • 67. Conception :: Text processing :: Filtering stop words 67  Stop-words selection strategy:  Chosen from the list of the most frequent words in Qur’an,  Considering vocalization  Preferring:  Particles such as َّ‫لك‬  Pronouns such as َ‫أنت‬  Clitics such as ‫ـ‬َ‫ف‬ 67
  • 68. Conception :: Text processing :: Stemming 68  We proposed stripping the affixes in tokenization  In Stemming, we bring the word back either to:  ROOT: Large set of words, different meaning  STEM: Smaller set of words, similar meaning 68
  • 69. Conception :: Quranic Word as Search Unit 69  Purpose: obtain a quick efficient stable method to retrieve specific Quranic words.  Requirements:  A Quranic words corpus , enriched with linguistic annotations  Word occurance as a unit  Word form as a unit  Information Schema:  Identifiers: a global identifier, a secondary identifier based on the order in the ayah added to ayah identifier and surah identifier;  Different forms: Uthmani vocalized word (the main form), Standard vocalized word, Standard unvocalized word;  Transliterations: ISO233, Buckwalter, Arabtex;  Translations: English, other languages;  Different levels of stemming: Lemma, Stem, Root;  Other properties: Part Of Speech, type, state, case, mood, voice, number, gender, person. 69
  • 70. Conception :: 2-steps search strategy 70  1st step: retrieving the best keywords set based on the user query by searching in:  A word-as-a-unit index  A Quranic words ontology  2nd step: retrieving the corresponding ayahs using the keywords set resulted from the first step 70
  • 71. Conception :: 2-steps search :: applications 71
  • 72. Conception :: Word Search :: Word properties 72  Objective: allow the users to locate ayahs based on linguistic properties of words such as POS, type, state, case, mood, voice, number, gender, person.  Methods:  Fielded search:  A fielded search is an advanced query feature that enables users to select and associate the different document fields to which he wishes to limit the query, then use the required keywords within these fields. 72
  • 73. Conception :: Word Search :: Semantically Related Words 73  Objective: offer the related words of a keyword entered by the user.  Algorithm:  The user specifies:  The word  The semantic relation: Synonymy, Antonymy, Hypernymy, Hyponymy, Meronymy, Holonymy, Troponymy.  Inquiring the ontology for related words  Using those keywords to retrieve the corresponding ayahs. 73
  • 74. Conception :: Word Search :: Multi-level Derivations 74  Objective: get a set of words that share the same origin such as stem and root.  Algorithm:  The user specify:  the keyword  The a level of derivation.  Recovering the origin of the word in the specified derivation level  Retrieving all the set of words that share this origin. 74
  • 75. Conception :: Word Search :: Specific Derivations 75  Objective: find the words resultants of applying a specific derivation operation on the user given word.  Algorithm:  The user should:  Enter the keyword  Specify which derivation.  Generating the set of derived words either by:  fetching in the word index  using linguistic tools such as verb conjugators. be filtered as a second step by intersection with the set of Quranic words.  The resulted set will be used to locate the corresponding ayahs.* 75
  • 76. Conception :: Word Search :: Fuzzy Search 76  Objective: fetch using the set of words that are nearly similar to the input word in writing or pronunciation.  Methods:  Liechtenstein distance (previously unknown text)  Ngrams  Spell-checker  Soundtex (Phonetic ) 76
  • 77. Conception :: Word Search :: Fuzzy Search 77  Arabic Similarities Specifications  ‫مءصدة‬and ‫مؤصدة‬  ‫الحمد‬and ‫د‬ْ‫م‬‫الح‬  ِ‫عشر‬and ‫عشر‬  ‫ه‬‫يضل‬and ‫يضلله‬  Examples  Mis-order of letters: ‫زنبجيل‬for ‫زنجبيل‬  Phonetic similarity: ‫هرم‬for ‫إرم‬  Spelling similarity: ‫الضحي‬for ‫الضحى‬
  • 78. Open Source but WHY? There are a number of advantages lead us to open source, the following points examine the most important of these[Web-Oss-watch]:  Collaborative bug-fixing & Fast security vulnerabilities detection >Given enough eyeballs, all bugs are shallow< -- an open source slogan  Customization.  Translation & Localization.  Development discontinuation.  Being part of a community.  Low cost. 78
  • 79. Used Technologies :: Python Python is a powerful dynamic programming language, used widely.  Features:  powerful and fast  plays well with others  runs everywhere  friendly and easy to learn  Free Open 79
  • 80. Used Technologies :: Whoosh API  Whoosh is a full-text indexing and searching library implemented in Python  Features:  Pure Pythonic API  Fielded indexing and search  Fast indexing and retrieval  Powerful query language  Useful for circumstances such as:  Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries  As a research platform (Python is easier to read!)  When the search features are more important to us than the raw speed. 80
  • 81. Implementation :: Previous Code Base 81  Implemented on [Chelli&Dahmani2010]  Licensed under GPL*  (Server applications issue)  Based on Whoosh Indexing Library  Offering Many Search Operations  Results in HTML format  Raw format  Can be used in Python  Requires to write wrappers for other languages  A basic resource manager  Has a missing piece 81
  • 82. Implementation :: Our improvements 82  The code base:  has had 981 commits made  representing 15,243 lines of code  mostly written in Python  with a well-commented source code.  took an estimated 4 years of effort (COCOMO model) Reference: Ohloh Website. 82
  • 83. Implementation :: Our improvements New Output System 83  A New Output System:  JSON-Based ==> Simpler & more extensible  Centralized ==> Changes on one & only one place  Extended & Extensible Results Structure  Customizable Search Request using flags  Including a Statistic Calculating Unit  Offering Meta-Data for request 83
  • 84. Implementation :: Our improvements Multiple Search Units 84  Translation-as-unit:  Word-as-unit 84
  • 85. Implementation :: Our improvements Many new features 85  Fuzzy Search Feature  Retrieving the neighbors of each ayah 85
  • 86. Implementation :: Our improvements Many new features (2) 86  Manipulating different Quranic Scripts  More suggestion operations  Showing the linguistic annotations  Retrieving & Showing transliterated keywords (Buckwalter) 86
  • 87. Implementation :: Our improvements Resources Importing Manager 87  Resources Importing Manager:  Downloading original resources (Licensing issue)  Parsing & Importing the data to our intermediate database  Indexing the database  Updating auto-generated data files 87
  • 88. Implementation :: Our improvements Packaging System 88  Automating the API building  Packaging into:  Source Tarball  Binary Tarball  Python egg package  Debian deb package  Red-hat rpm package  Windows Installer  Mac OS (Perspective) 88
  • 89. Implementation :: Our improvements ->More<- 89  Coding Standardization  Following Python Conventions (PEP8)  Using Pylint (a source code bug and quality checker)  Documentation Covering  Enriching the code with Readme files  New Console interface 89
  • 90. Implementation :: Open Issues 90  Implement the modularity for the Query Parser: This is important to enable the extensibility feature and fix the problem of mixing (the combination) the different operations made during parsing.  Restrict the anonymous requests to the API: restricting requests protect the API from flooding either intended or not. This can be done by:  Limit the maximum of simultaneous requests globally and by IP.  Implement an identification system that works with remote clients.  Move to the last version of Whoosh library: Whoosh is almost in the version 3.X in its stable release while we still using an older version which is 0.3. The moving to the last version is very recommended to benefit of the improvements made. Though, it will not be an easy operation since our API is intertwined with the older version. Especially for the Query Parser. 90
  • 92. Implementation :: Open Issues 92  Complete the features implementation  Enrich the linguistic resources  Implement the modularity for the Query Parser  Restrict the anonymous requests to the API  Move to the main stream of Whoosh library  Maintain compatibility between Python versions  Cover with documentation  Optimize code and performance 92
  • 93. Implementation :: Open Issues 93  Enriching the linguistic resources: the actual used resources are poor comparing to what we really need.  Integrate Qurany project to enrich the actual faceted thematic search.  Integrate the boundary annotations to enable the retrieving of boundaries in Quran.  Propose a standard format for new linguistic and Quranic resources.  Textify the binary database to enable the possibility of logging of changes and take the benefits of revision control systems such as GIT. 93
  • 94. Implementation :: Open Issues 94  Complete the features implementation Fielded search YES Logical relations YES Phrase search YES Interval search YES Full Regex NO Wildcards PARTIALLY Boosting keywords YES Pagination YES Scoring YES Sorting YES Keywords Highlight YES Uthmani full marks YES Real time output NO Results grouping NO Spell correction PARTIALLY Related keywords PARTIALLY Different vocalizations YES Collocated words NO Keyboard mapping NO Different significations NO 94 Romanization PARTIALLY Partial vocalization PARTIALLY Multi-level derivation YES Syntactic Coloration NO Vocal Search NO Specific-derivations NO Linguistic annotations PARTIALLY Fuzzy string PARTIALLY Word properties PARTIALLY Linguistic examples NO Structural options YES Translation search YES Uthmani writing way NO Recitation marks PARTIALLY Divine Names Highlight NO Repetitions&Allegoricals NO Abrogators&Abrogated NO Qur’anic Parables NO Semantically related words PARTIALLY Faceted Thematic Search PARTIALLY Entity Extraction NO Questions Answering (QA) NO Automatic vocalization NO Co-reference resolution NO Vocalized word frequency YES Unvocalized word frequency YES Another Qur’anic units frequency NO Root/Stem/Lemma frequency NO
  • 95. Implementation :: Open Issues 95  Move to Python 3.X: Python 2 is disappearing and sooner or later it’ll be fully replaced. There are many tools offer some automatic scripts to convert a code from 2 into 3. Though, the big part often should be done manually.  Cover with documentation: the documentation is so important, it’s expensive but it encourages the community to involve in the project. This can be done by:  Enrich the readme files;  Enrich the code with appropriate comments;  Create a usage How-To and straighten it with many demos;  The man page for the console interface.  Optimize code and performance: proceed the fixing of pylint code analysis warnings and use Profile to check the performance of each search feature in order to improve it. 95
  • 97. Implementation :: interfaces :: API 97 Powerful Points: 1. Free Libre Open 1. A Python API 1. A founded base 1. Lot of features
  • 98. Implementation :: interfaces :: API#Sample 9898
  • 99. Implementation :: Interfaces :: JSON web service 9999
  • 100. Implementation :: Interfaces :: Console 100100
  • 101. Examples of use  As a desktop application  As a web interface  www.alfanous.org  As a smart phone app  iPhone, iPad  Windows phone 101
  • 102. Examples of use :: Alfanous.org 102  Remarkable Features:  Localizable  Awarded:  As the best-in-technicality website in Algeria Web Awards 2012
  • 103. Examples of use :: Alfanous.org (Responsive) 103  Remarkable Features:  User experience  Responsiveness  Simplicity  Awarded:  chosen as the best website categorized under the religious websites in Algeria Web Awards 2013 103
  • 104. Examples of use :: iPhone Application 104  Developed by:  iPhone-islam (objective-C)  Remarkable Features:  running on iPhone and iPad series 104
  • 105. Examples of use :: Windows phone APP 105  Developed by:  Moumen bou Abdellah (C#)  Remarkable Features:  Running on windows phone 105
  • 106. Examples of use :: Alfanous Desktop Interface 106  Remarkable Features:  Offline use 106
  • 107. Conferences 1. An Arabic paper in NITS 2011 KSA:  Title: An Application Programming Interface for indexing and search in Noble Quran  Authors: Assem Chelli, Merouane Dahmani, Amar Balla, Taha Zerrouki. 2. An English paper in a pre-conference workshop in LREC 2012 Turkey which is about ”LRE-Rel: Language Resource and Evaluation for Religious Texts”  Title: Advanced Search in Quran: Classification and Proposition of All Possible Features.  Authors: Assem Chelli, Amar Balla, Taha Zerrouki. 107
  • 108. Conclusion & Perspectives 108  We went through the implementation of many search features that we previously enlisted.  Unfortunately, there are more improvements to be done and many issues to be resolved. We left them as perspectives:  Achieving an accurate statistics gathering system;  Implementation of a more adequate suggestion system;  Clear the way toward a semantic search engine;  Proceeding the full conception of all search features.  Complete implementation of all open issues. 108
  • 109. Contacts: Email: assem.ch@gmail.com Github: @assem-ch Twitter: @assem_ch Project Links: Website: www.alfanous.org User feedback: feedback.alfanous.org Source-code: www.github.com/assem-ch/alfanous

Editor's Notes

  1. Assalamu Alaikom, Mr. the president,, Missis and misters the members of the committee, my Family, my friends , everybody here…I am Assem Chelli and you are welcome to my presentation defending my thesis entitled “Proposal of an Advanced Retrieval System for Noble Qur’an”.
  2. Here is the plan of the presentation, I start with an introduction to the idea and then I explain the source of problem.
  3. Comme l’indexation, la recherche est un processus à plusieurs étapes, comme le montre la figure.
  4. Each verb has its set of associated deverbal forms which it maintains morphological, syntactic and semantic relations. The number and nature of these forms vary depending on the status of the verb. We cite some deverbal forms:
  5. Marks of Waqf are used to knowing when to take a break during the recitation of the Qur’an, these marks are different to distinguish the type of waqf that can be: allowed, preferred, prohibited, etc. Note that the marking of Waqfs differs between rewayates ( الروايات ). The figure shows a reference of waqfs based on rewayate of Kaloun
  6. المصاحف are written based on the Uthmani script which is quite different to the standard arabic script. , for example: the word "سأريكم" is written on Uthmani as سأوريكم” with an additional letter waw
  7. These sciences hanging the Qur’an as a subject for study to illustrate and explore its secrets, some books are found under different names such as Revelation Science ( علم التنزيل ) and Book Science ( علم الكتاب ),
  8. The indexes are catigorized by purpose on 5 main categories: /// Syntactic, Semantic, Structural, Statistical, Thematic
  9. Our proposal is about design a retrieval system that fit perfectly the Qur’an search needs. But to realize this objective, we must first list and classify all the search features that are possible and helpful. Then we need to study how to implement each feature and what is its requirements.
  10. Améliorations de sortie Pagination 10,20,50… résultats par page Tri l’ordre de mushaf l’ordre de révélation l’ordre par pertinence l’ordre numérique et alphabétique des champs Améliorations: Vérifier le vrai ordre des symbole arabes: Ordre de Hamza: ؤ ئ ء أ Order de Ta’ : ة ت Order de Alef : ى ا Sur-lignage (Highlight) الحمد <style>لله </style>رب العالمين
  11. Affichage en Temps-réelle Groupage des resultats par similitudes par sourates par sujets par phrases par exemples coraniques par dépendances en taffsir par raisons de révélations … Affichage en script Uthmani plein de marques [ ۞ لَّقَدْ كَانَ فِى يُوسُفَ وَإِخْوَتِهِۦٓ ءَايَٰتٌ لِّلسَّآئِلِينَ]
  12. Les Systèmes de suggestion: Suggestion de mots-clés alternatifs أبراهام: إبراهيم Améliorations: Régler les limitations des N-grammes pour Les mots vocalisés Suggestion de mots-clés apparentés basé sur une ontologie يعقوب : يوسف، الأسباط، نبي ... Suggestion de différents vocalisations d’un mot الملك : المَلِك ، المُلْك، المَلَك ... Suggestion des mots colloqués سميع : سميع عليم، سميع بصير الحمد : الحمد لله
  13. Les Systèmes de suggestion: Suggestion de mots-clés alternatifs أبراهام: إبراهيم Améliorations: Régler les limitations des N-grammes pour Les mots vocalisés Suggestion de mots-clés apparentés basé sur une ontologie يعقوب : يوسف، الأسباط، نبي ... Suggestion de différents vocalisations d’un mot الملك : المَلِك ، المُلْك، المَلَك ... Suggestion des mots colloqués سميع : سميع عليم، سميع بصير الحمد : الحمد لله
  14. The linguistic aspects
  15. Les requêtes Sémantiques Les Questions naturelles: من؟ ما؟ أين؟ متى؟ لم؟ كم؟ من هم الأنبياء؟ [ إِنَّا أَوْحَيْنَا إِلَيْكَ كَمَا أَوْحَيْنَا إِلَى نُوحٍ وَالنَّبِيِّينَ مِنْ بَعْدِهِ وَأَوْحَيْنَا إِلَى إِبْرَاهِيمَ وَإِسْمَاعِيلَ وَإِسْحَاقَ وَيَعْقُوبَ وَالْأَسْبَاطِ وَعِيسَى وَأَيُّوبَ وَيُونُسَ وَهَارُونَ وَسُلَيْمَانَ وَآتَيْنَا دَاوُودَ زَبُورًا] - النساء 163 ما هي الحطمة؟ [ نَارُ اللَّـهِ الْمُوقَدَةُ] - الهمزة 6 أين غلبت/هزمت الروم؟ [فِي أَدْنَى الْأَرْضِ وَهُمْ مِنْ بَعْدِ غَلَبِهِمْ سَيَغْلِبُونَ] - الروم 3 كم مكث أصحاب الكهف؟ [ وَلَبِثُوا فِي كَهْفِهِمْ ثَلَاثَ مِائَةٍ سِنِينَ وَازْدَادُوا تِسْعًا] - الكهف 25 متى يوم القيامة؟ [يَسْأَلُكَ النَّاسُ عَنِ السَّاعَةِ قُلْ إِنَّمَا عِلْمُهَا عِنْدَ اللَّـهِ وَمَا يُدْرِيكَ لَعَلَّ السَّاعَةَ تَكُونُ قَرِيبًا] - الكهف 25 كيف يتشكّل الجنين؟ [ثُمَّ خَلَقْنَا النُّطْفَةَ عَلَقَةً فَخَلَقْنَا الْعَلَقَةَ مُضْغَةً فَخَلَقْنَا الْمُضْغَةَ عِظَامًا فَكَسَوْنَا الْعِظَامَ لَحْمًا ثُمَّ أَنْشَأْنَاهُ خَلْقًا آخَرَ فَتَبَارَكَ اللَّـهُ أَحْسَنُ الْخَالِقِينَ] -  المؤمنون 14 Auto Vocalisation رسول من الله  رَسُول مِنَ اللهِ Résoudre les propre noms mentionnés implicitement بنيامين؟ [ إِذْ قَالُوا لَيُوسُفُ وَأَخُوهُ أَحَبُّ إِلَى أَبِينَا مِنَّا وَنَحْنُ عُصْبَةٌ إِنَّ أَبَانَا لَفِي ضَلَالٍ مُبِينٍ] -  المؤمنون 14 أبو بكر/ الصديق؟ [ثَانِيَ اثْنَيْنِ إِذْ هُمَا فِي الْغَارِ إِذْ يَقُولُ لِصَاحِبِهِ لَا تَحْزَنْ إِنَّ اللَّـهَ مَعَنَا ] -  التوبة 40
  16. We launch the survey for 45 days and we gathered about 37 takers. The following pie charts describe the variations between the background of the participants including Age, Gender, Country, Language.
  17. We’ve gathered the information about the experience of the participants in four axes: Quran, Arabic, Linguistics, Computing. This following chart describes this:
  18. We got very helpful results from launching the survey. The following figure describes the percentage of clarity, usefulness, and need of each search feature listed
  19. We’ve started the work on the idea of search in Quran in the Engineer degree graduation project entitled ”Development of a search and indexing engine for Qur’anic documents”1[Dahmani2010].
  20. The Quranic text written on standard script but due to the difficulties caused by the its differences with Othmani script, we have to consider both scripts for indexing and search. Among those difficulties, we mention
  21. As we’ve said, to resolve those difficulties we consider Othmani text also for text processing along with standard text. In addition, we propose many improvements on the text processing phases to achieve many of search features
  22. This is a new phase that we propose to be added before tokenization. Its objective is to identify a list of pre-defined patterns and replace them as a preparation for tokenization.
  23. Previously, we have considered the ayah as the search unit. The ayah being the unit is yet the perfect choice. However, to attend many linguistic features ,We need to consider a different search unit: the Quranic word ( اللفظ القرآني ).
  24. We use the word search to improve the ayah search by introducing a 2-steps search strategy.
  25. Building on the 2-steps strategy we propose those new operations to be implemented in the query parser:
  26. Le projet sera Open Source Mais pour qoui?, Il ya certain nombre d'avantages que l'open source offre plus de source fermé, cette diapo examiner les plus importantes de celles-ci: nous pouvons recevoir les feedbacks de la communauté open source et ceci rend plus facile la fixation des bogues et accélère la tâche de développement. l’Open source protègent également le projet de l’abandon et réduire le coût de mise en œuvre.
  27. Pour atteindre nos objectives, nous supposons d’utiliser certaines technologies pour l’implémentation: Python is a remarkably powerful dynamic programming language that is used in a wide variety of application domains.
  28. Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python
  29. Those are the main milestones that we went through:
  30. Those are the main milestones that we went through:
  31. Those are the main milestones that we went through:
  32. The implementation is working as a library offering many interfaces. The main one is the Application Programming Interface or API. It works as the intermediary between the library and the other interfaces. There are two low-level interfaces that works with the API: 1. Console interface, destined for test purposes and to be used by third party non- pythonic desktop interfaces. 2. JSON web service, destined to be used by web interfaces, smart phone apps, and social network apps.
  33. We went through many improvements. Yet, however, there still lot of things to be done. We’ll browse the main milestones that should go through:
  34. We went through many improvements. Yet, however, there still lot of things to be done. We’ll browse the main milestones that should go through:
  35. The implementation is working as a library offering many interfaces. The main one is the Application Programming Interface or API. It works as the intermediary between the library and the other interfaces. There are two low-level interfaces that works with the API: 1. Console interface, destined for test purposes and to be used by third party non- pythonic desktop interfaces. 2. JSON web service, destined to be used by web interfaces, smart phone apps, and social network apps.
  36. An application programming interface (API) is a protocol intended to be used as an interface by software components to communicate with each other. An API is a library that may include specification for routines, data structures, object classes, and variables. 1. Free Open Libre: any one can use it, any one can contribute in. That means it takes the advantage of community involvement. 2. A Python API: that allows anyone to create independently a web interface, desktop interface , Android/Iphone/Windows phone interfaces , facebook/twitter/G+ applications ...and so on. 3. A founded base: The search process is too fast and too stable other web- sites/applications do. 4. Lot of features: The actual API has an important number of features and prepared to accept more.
  37. An application programming interface (API) is a protocol intended to be used as an interface by software components to communicate with each other. An API is a library that may include specification for routines, data structures, object classes, and variables. 1. Free Open Libre: any one can use it, any one can contribute in. That means it takes the advantage of community involvement. 2. A Python API: that allows anyone to create independently a web interface, desktop interface , Android/Iphone/Windows phone interfaces , facebook/twitter/G+ applications ...and so on. 3. A founded base: The search process is too fast and too stable other web- sites/applications do. 4. Lot of features: The actual API has an important number of features and prepared to accept more.
  38. To enable the use of our API over the web, we made a web service that wrap the input/output of the API. The request arguments should be passed in URL and the output will be generated and shown in JSON format. This could be used by web interfaces, smart phone apps, social network apps, and browsers addons.
  39. As a test interface, we made a console interface that works on command line. This interface could be used also as a wrapper to make desktop interfaces that are developed under a programming language different then Python. The request should be passed as in-line arguments in the command line and the output will be generated & shown in JSON format. High-activity desktop interfaces, working on a linux-like platform, can run this interface as a Daemon service on the background.
  40. THANK YOU For your Attention … Any Questions?