The Religion Of Ibrahim (Millatu Ibrahim)
And The Calling Of The Prophets And Messengers (PBUT)
– Second Edition
Shaykh-Abu-Muhammad-Al-Maqdisi
Abu Muhammad Aasim al-Maqdisi
Language: English | Format: PDF | Pages: 218 | Size: 2 MB
The Religion Of Ibrahim (Millatu Ibrahim)
And The Calling Of The Prophets And Messengers (PBUT)
– Second Edition
Shaykh-Abu-Muhammad-Al-Maqdisi
Abu Muhammad Aasim al-Maqdisi
Language: English | Format: PDF | Pages: 218 | Size: 2 MB
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Using automated lexical resources in arabic sentence subjectivityijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Adopting Quadrilateral Arabic Roots in Search Engine of E-library Systempaperpublications3
Abstract: Information retrieval is the method to retrieve information according to user needs. E-library is one of the interesting ways for study and education because it includes a huge amount of information and it is stored in special database or extracting from a corpus of documents. The E-library is a part of an information retrieval system. It provides methods to get information and increase knowledge. But there are inadequacies in the Arabic terms search library and they can be solved by enhancing or adopting algorithm in order to make the search of Arabic language more efficient and easier. In this work, an algorithm for quadrilaterals of Arabic words for use with a search engine of the E-library has been adopted and integrated with New Approach for Extracting Quadrilateral Arabic Root and Pattern – based Stemmer for Finding Arabic Roots. According to the analysis done between ordinary search and quad search, it can observe the Confidence Interval of the Difference (95%) in the ordinal search located between 1.34 and 1.66. In contrast, it located between 1.45 and 1.95 in quad search. So, the result of using algorithm for quadrilateral of the Arabic word is more effective than ordinary search regarding the results in analysis tests.
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Using automated lexical resources in arabic sentence subjectivityijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Adopting Quadrilateral Arabic Roots in Search Engine of E-library Systempaperpublications3
Abstract: Information retrieval is the method to retrieve information according to user needs. E-library is one of the interesting ways for study and education because it includes a huge amount of information and it is stored in special database or extracting from a corpus of documents. The E-library is a part of an information retrieval system. It provides methods to get information and increase knowledge. But there are inadequacies in the Arabic terms search library and they can be solved by enhancing or adopting algorithm in order to make the search of Arabic language more efficient and easier. In this work, an algorithm for quadrilaterals of Arabic words for use with a search engine of the E-library has been adopted and integrated with New Approach for Extracting Quadrilateral Arabic Root and Pattern – based Stemmer for Finding Arabic Roots. According to the analysis done between ordinary search and quad search, it can observe the Confidence Interval of the Difference (95%) in the ordinal search located between 1.34 and 1.66. In contrast, it located between 1.45 and 1.95 in quad search. So, the result of using algorithm for quadrilateral of the Arabic word is more effective than ordinary search regarding the results in analysis tests.
The impact of artificial intelligence (AI) in Islam encompasses ethical, educational, financial, and legal considerations. AI can enhance education, improve healthcare, and streamline Islamic finance. Ethical questions about AI's alignment with Islamic values arise, as does the need for scholars to assess AI-generated legal decisions and contracts. The job market, social justice, and economic inequality may be affected. Accessibility and inclusivity align with Islamic principles, and AI-generated art may challenge notions of creativity. Theological discussions about consciousness and free will may also emerge. The extent of AI's impact will depend on its development, regulation, and integration in Islamic societies.
https://islamicmualim.com/
Arabic words stemming approach using arabic wordnetIJDKP
The big growth of the Arabic internet content in the last years has raised up the need for an effective
stemming techniques for Arabic language. Arabic stemming algorithms can be ranked, according to three
category, as root-based approach (ex. Khoja); stem-based approach (ex. Larkey); and statistical approach
(ex. N-Garm). However, no stemming of this language is perfect: The existing stemmers have a low
efficiency. In this paper, we introduce a new stemming technique for Arabic words that also solve the
problem of the plural form of irregular nouns in Arabic language, which called broken plural. The
proposed stem extractor provides very accurate results in comparisons with other algorithms.
Consequently the search effectiveness improved.
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMkevig
This paper proposes an improved morphological analyser for Arabic pronominal system using finite state method. The main advantage of the finite state method is very flexible, powerful and efficient. The most important results about FSAs, relates the class of languages generated by finite state automaton to certain closure properties. This result makes the theory of finite-state automata a very versatile and descriptive framework. The main contribution of this work is the full analysis and the representation of morphological analysis of all the inflections of pronoun forms in Arabic. In this paper we build a finite state network for the inflectional forms of the root words, restricted to all the inflections and grammatical properties of generating the dependent and independent forms of pronouns in Arabic language. The results show high score of accuracy in the output with all the needed linguistic features and the evaluation process of output is conducted using f-score test and the achievement is at the rate of 80% to 83%. The results from the study also provide the evidence that Arabic has strong concatenative word formations.
Proposal of an Advanced Retrieval System for Noble Qur’anAssem CHELLI
Noble Quran is different of all documents that we have known. It’s the sacred book
of Muslims. It contains knowledge of all aspects of life. With this huge quantity of
information, we can extract only a small part manually and this is considered insuffi-
cient compared to the size of knowledge contained by Quran. That raises the need for
a method to extract those information because currently there is no efficient method
except many printed lexicons and many tools of simple sequential search with regular
expression. Due to this limitation, the Quran requires us to find new ways to interact.
The goal through this work is to propose a system for advanced research in all of
the information contained in the Quran by considering the morphology of the Arabic
language and the properties of the Qur’anic text. It should be based on modern meth-
ods of information retrieval for good stability and high speed search. It would be very
useful for researchers and could be generalized to cover all the content in Arabic.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
2. T H E S I S O F M A G I S T E R
Proposal of an Advanced
Retrieval System for Noble Qur’an
3. Plan
Introduction
Problematic
State of Art
Search Engines
Arabic Language
Noble Quran
Objectives
Proposed search
features
Conception
Implemented work
Published papers
Conclusion &
Perspectives
_
4. Introduction
Qur’an, in Arabic, means the Read or the Recitation.
Muslim scholars define it as:
« the words of Allah revealed to His Prophet Muhammad, written in
Mus’haf and transmitted by successive generations »
Qur’an is a sacred book for all Muslims
Qur’an is also the first reference to Islamic law.
The Muslims, through 14 centuries, are still:
Studying it,
Teaching it,
Writing books about it,
Developing applications for it -recently-.
4
5. Problematic
Qur’an is an important source of information about all
aspects of life:
Scientific, Social, Historical, Political, Ethical, Juridical, etc.
With a huge amount of information.
Quran is extremely difficult for regular search tools to
successfully extract key information, so we should find
other ways to enquire!
The appropriate solution for that is an Advanced
Retrieval System
Why a Retrieval System?
Why advanced?
5
6. Indexing
Indexing consists in :
Analyzing each document in the collection to create a set of
keywords.
Creating a representation of documents in the system.
Supporting other domains:
Auto-Clustering of documents,
Related keywords suggestion
Documents Auto-Analysis,
Calculating collocated terms,
Auto-summarization.
Etc.
6
7. Full-text search
A technology of finding documents matching a set of words.
Most of the web search engines such as Google and Bing!
use full-text search engines at the heart of their service
The core of a full-text search engine is split into two main
operations:
Indexing the information into an efficient format
Searching the relevant information from this pre-computed index
7
8. Indexing :: Phases
Example: « Assem is >defending< his thesis!! »
Tokenization: Assem + is + >defending< + his + thesis!!
Normalization: assem + is + defending + his + thesis
Filtering stop words : assem + $ + defending + $ + thesis
Stemming: assem + $ + defend + $ + thesis
Resulted keywords: assem, defend, thesis
8
10. Querying (Search)
Querying is the phase of interaction between the system and
the user.
Search takes a user query and returns the effective list of
matching results sorted by relevance.
Relevance: A degree of relationship between the document
and the query
10
12. Semantic Approach
12
Objective: improve search accuracy by
understanding searcher intent and the contextual
meaning of terms to generate more relevant results.
Semantic search does not just mean contextual
search
It is a smart search that would consider several
factors to provide the most relevant and useful
search queries.
13. Semantic Approach :: factors
13
Current trend
Location of search
Intend of the search
Variations of words
Synonyms
Generalized and Specialized queries
Concept matching
Natural language queries
Change of meaning based on the group of words
13
14. Semantic Approach :: factors
14
Current trend
Who wins the Classico? last one of course
Location of search
Weather temperature? here in Algiers preferably
Intend of the search
Earth quake Checking if one happened, or looking for articles
Variations of words
Man, Men, Man’s.
Synonyms
Biggest mountain , Highest mountain
Generalized and Specialized queries
Health vs Diabetes
Concept matching
Half life the game or the physical constant
Natural language queries
What time is it in Cairo?
Change of meaning based on the group of words
New egg health benefits
New egg health products
14
15. Arabic :: Orthography
A Semitic language
The language of Quran
A Right-to-Left language
15
16. Arabic :: Lexicography
16
The classical Arabic grammar has only three subsets
Verbs
Verbs with a simple root (المجرد :)الفعل َلَعَف
Hamzated verb (,)مهموز Assimilated verb (,)مثال Hollow verb (,)أجوف
Weakened verb (,)ناقص Geminated verb (فَع.)مض
Verbs with augmented root (المزيد )الفعل
لّعف،فاعل،أفعل،لّعتف،تفاعل،افتعل،انفعل،استفعل
Nouns
Primitive nouns (الجامدة )األسماء :
Nouns derived from verbals (المشتقة )األسماء
Numbers, Demonstrative pronouns, Relative pronouns, Personal
pronouns, Function words
Particles
17. Arabic :: Morphology
• Arabic is a fusional language, considered as an intro-flexion
language:
• Consonants indicate the meaning
• Vowels mark the flexion
• Arabic language is very rich and based on the structure of
patterns (about 500) and roots (about 7000).
• Theoretically:
• A single Arabic root can generate hundreds of
words (noun, verb, ...) by applying patterns.
• A single Arabic word can exist in about a hundred
of forms by adding certain suffixes and prefixes
17
18. Arabic :: Flexional Morphology
18
• Arabic uses for the conjugation of verbs and declension
of nouns, some indications (Generally Affixes) of:
• aspect, mood, time, person, gender, number, case.
• These flexional marks can distinguish:
• Mode of verbs: Perfective, Imperfective …
• Function of nouns: Nominative, Accusative,
Genitive
19. Arabic :: Flexion
19
• Flexion of verbs (Conjugation)
o Aspect
o Mood
Doubted, Affirmed (Actual or Eventual)
o Tense
Perfective (:)الماضي فعلت ،َفعلت ،ُفعلت
Imperfective ()المضارع
Imperative ()األمر
22. Arabic :: Flexion :: Nouns
22
• Flexion of nouns (declension)
o 3 cases:
Nominative ()الرفع
Accusative ()النصب
Genitive ()الكسر
o Depends on:
Number: Singular (,)المفرد Dual (,)المثنى Plural ()الجمع
Form: Triptote , Diptote , etc.
-
23. Arabic :: Flexion :: Nouns
23
o Declension of Singular nouns
Triptotes (المنصرفة :)األسماء اًبكتا ٍبكتا كتاب
Diptotes (الصرف من الممنوعة :)األسماء قاحلة صحراء
Five Nouns (الخمسة :)األسماء أخوأخاأخي
Deverbals with defective roots : ماض
o Declension of dual nouns: كتابان كتابين
o Declension of plural nouns
External masculine plural (السالم مذكر :)جمع كاتبين كاتبون
o Declension of function words
Invariables : منذ
Variables: َّلك
24. Arabic :: Derivational morphology
24
o Deverbal noun (:)المصدر دو , دو , اددِ,و ةاددِو , َّةدوم
o Active participle (فاعل :)اسم ب ِارض (hitter)
o Passive participle (مفعول :)اسم وبْرضم (struck)
o Nouns of time and place (والمكان الزمان ٔسماءا): ةسْردم (school), ب ِرْغم
(sunset)
o The Nomen Vicis (المرة :)اسم ةب ْرض (a hit)
o The Nomen Speciei ( الهيئة :)اسم تسلج_ةسْل ِج_الااتيرِم (she sat like
princesses)
25. Arabic :: Ambiguities :: Absence of Vocalization
If text has the word (,)الملك
How should search engine understand the meaning?
Is it ?
1. « كل|الم Angel »,
2. « لك|الم Kingdom »
3. « كِل|الم King »
25
26. For the word « »وعد , the letter wâw « »واو is :
1. A part of the word:
دعو (to promise)
2. Not a part of the word:
وَّدع (and + to count)
Arabic :: Ambiguities :: Prefixes
26
27. For the word « ,»وله the letter ha’ ()هاء is :
1. A part of the word:
هِلو (admire)
2. Not a part of the word:
ِّلوِه (crown + him)
وله (and + he <-> has)
Arabic :: Ambiguities :: Suffixes
27
28. Quran :: Structure
The Qur’an consists of 114 surahs, the surahs are divided into
ayahs.
the main fragmentation, specified by the prophet.
28
القرآن
سورة1
•آية
•آية
•آية
•آية
•...
سورة2
•آية
•آية
•آية
•آية
•...
...
سورة114
•آية
•آية
•آية
•آية
•...
29. Quran :: Structure
There are many fragmentations:
Primary structure: surah, ayah, word and letter;
Special locations: First ayahs of Surah ( السورة ,)فواتح Last ayahs of
Surah ( السورة ،,)خواتي Qur’anic comma ( فاصلةقرانية ), Sajdah ( ,)سجدة
Waqf ()وقف
Other Structures: page, Juz’ ()جزء , Hizb( ,)حزب Nisf( ,)نصف Rubu’(
,)ربع Thumn( َّ)ثم
القرآن
أول جزء
حزب
نصف
ربع
َّثمَّثم
ربع
نصف
حزب
...
جزء
ثَلثون
29
القرآن
سورة1
•آية
•آية
•آية
•آية
•...
سورة2
•آية
•آية
•آية
•آية
•...
...
سورة114
•آية
•آية
•آية
•آية
•...
32. Quran :: Sciences
32
Specific to Quran
Tafssīr ()التفسير
Knowledge of Makkan and Medinan ayahs
Knowledge of the causes of revelation
Knowledge of the beginnings of surahs
Science of allegorical ayahs (المتشابه ،)عل
Qur’anic Parables ( الامثالالقرانية )
32
33. Quran :: Sciences
33
Shared with other resources
Legislative Study:
Fiqh ( )الفقه
Abrogating and Abrogated ayahs ( والمنسوخ اسخّن)ال
General and Particular (ّموالعا ّ)الخاص
Lingustic Study:
Orthography (ّالخط مرسوم ،)عل
Grammatical analysis of the Qur’an ( اعراباالفاظالقران )
Morphology ( )الصرف
Rhetoric ( )البَلغة
Lexicology ( ،المعاج ،)عل
Scientific Study
Scientific Miracles in Quran
Numerical study of verses (ignoring the debate about it)
35. Quran :: Indexes :: Projects
35
Midād lbayān
Word morphology index
Zerrouki’s Indexes
Word morphology index
Topic index
Synonym index
Qur’anic Arabic Corpus
Word_by_word morphology index
Tanzil Project
Ayah index (Electronic Mushaf)
Sructural index
Surah index
Boundary-Annotated Qur’an Corpus
Word_by_word Waqf index (+mapping Uthmani-Standard)
Qurany Concepts Tool
Concept index
39. Quranic Search Tools :: Global Critics
39
They are not Full-Text Search Engines
except Tanzil’s and Zekr’s advanced Search.
Basic Search Operations
Simple Query System
Weak or unsupported linguistic operations
except Quranic Corpus word_by_word search
No Semantic Approach
Closed source
except Zekr
Implemented as Interfaces, not as APIs or Librairies.
39
40. Objectives
40
Design a retrieval system that fits perfectly the
Qur’an search needs.
Yet, first we should list and classify all the search features that
are possible and helpful.
Then, we need to study how to implement each feature and
what is its requirements.
40
42. Proposed Search Features :: Output Improvements
Pagination
Sorting
Relevance
Mushaf natural order
Revelation order
Numirical, Alphabitical, or Abjad order
Keyword Highlight
ذرنيخلقت َّوم<style>وحيدا</style>
42
43. Proposed Search Features :: Output Improvements (2)
Real time output
Results grouping
by surahs
by topics
by taffssir dependency
by revelation events
by allegorical ayahs
by parables
Uthmani script with full diacritical marks
43
44. Proposed Search Features :: Suggestion System
Spell corrections
ابراهام:،إبراهي
Semantically related words
(Ontology-based)
يعقوب:نبي ، إسرائيل ،إسحاق ،يوسف ...
44
45. Proposed Search Features :: Suggestion System (2)
Different vocalizations
:الملك كِلالم،كْلالم،كلالم ...
Collocated words
:سميع سميععليم،سميعبصير
:الحمد الحمدهلل
Keyboard mapping
fsl: (بسمf ,ب s ,س l )م
Different significations
:رب 1st meaning (god), 2nd meaning (master)
45
46. Proposed Search Features :: Linguistic aspects
Romanization
خليفة : kalīfaẗ (ISO233), xalyfap (Buckwalter), _halyfaT (Arabtex).
Syntactic Coloration
Partial vocalization search
مَـلـك to locate مَـكِـل , مَـكَلـ … and ignore مُـكْـل
Multi-level derivation
(Word: اسقيناا, level: lemma) to find ا َوْ،ُكَانْيَقْس,ْ،ُهَانْيَقْس َل,اَفُمُكَانْيَقْسُهو .
Specific-derivations
Conjugaison in perfective of قال to find قال,قالت ,قالوا ,َّقل ...
46
47. Proposed Search Features :: Linguistic aspects
Vocal Search
Word linguistic
annotation
….
47
48. Proposed Search Features :: Linguistic aspects
Word properties embedded query
{ جذر:ملكنوع:اسمعدد:مفرد }
Numerical values search
309 replaced by وتسعة ثَلثمائة
Fuzzy string search
مءصدة may replace مؤصدة
Linguistic examples search
Rhetorical deletion (البَلغي )الحذف
Grammatical Shift (اللتفات
Uthnmani writing way
بسطة may replace بصطة
نعمت may replace نعمة
_
48
49. Proposed Search Features :: Quranic Options
Recitation marks retrieving
سجدة:،نع
Structural options
صفحة:1
جزء:،ع
Divine Name Highlight
49
51. Proposed Search Features :: Semantic Queries
Semantically related words
Syn( )جنة to find ,جنة ،نعي ,فردوس …
Ant ( )جنة to find ،,جحي سعير , ،جهن , سقر …
Is ( )جنة to find فردوس ،عدن
… (based on ontology)
Faceted Thematic Search
-
51
52. Proposed Search Features :: Semantic Queries
Natural Questions: ك،؟ ل،؟ متى؟ أيَّ؟ ما؟ مَّ؟
ماهيالحطمة؟What is Al-hottamat?
[ُةَدَقوُمال ـهَّلال ُارَن]-الهمزة6
It is the fire kindled by Allah
َّم،هاألنبياء؟Who are the prophets?
[َلإ اَنيَحوَأَو هدعَب نم َينيبَّنالَو وحُن ىَلإ اَنيَحوَأ اَمَك َكيَلإ اَنيَحوَأ اَّنإَمسإَو َمياهَربإ ىعَو اطَبسَاْلَو َوبُقعَيَو ََاقحسإَو َلياعَسُنوُيَو َوبُّيَأَو ىَسي
َدوَُاود اَنيَتآَو َانَميَلُسَو َونُارَهَواًورُبَز]-النساء163
َّأيغلبت/الروم؟ هزمتWhere was Rome defeated?
[َونُبلغَيَس مهبَلَغ دعَب نم مُهَو ضرَاْل ىَندَأ يف]-الروم3
،كالكهف؟ أصحاب مكثHow long did People of Cave stay?
[اًعست ُوادَادازَو َيننس َةئام َث ََلَث مهفَهك يف واُثبَلَو]-الكهف25
متىالقيامة؟ يومWhen is the Day of Resurrection?
[اَّسال َّلَعَل َيكردُي اَمَو ـهَّلال َدنع اَهُملع اَمَّنإ لُق َةعاَّسال َنع ُاسَّنال َكُلَأسَياًبيرَق ُُونكَت ََةع]-الكهف25
كيفالجنيَّ؟ لّكيتشHow has the embryo be formed?
[َةَفطُّنال اَنقَلَخ َّمُثًةَقَلَعًماَظع َةَغضُمال اَنقَلَخَف ًةَغضُم َةَقَلَعال اَنقَلَخَفَّمُث اًمحَل َماَظعال اَنوَسَكَف اُهاَنَأشنَأَارَبَتَف َرَخآ اًقلَخَينقالَخال ُنَسحَأ ُهـَّلال َك]-
المؤمنون14
52
53. Proposed Search Features :: Semantic Queries (2)
Auto Vocalisation
هللا َّم رسول ِهللا ََِّم ولُس َر
Entity extraction
تسعا وازدادوا َّسني مائة ثَلث as (Time/number, 309)
ببكة as (place, Mekka)
البصر كلمح as (time unit, ??)
ذرة مثقال as (size unit, ??)
يااايهاالنبي as (person, Mohammad)
Proper nouns search (co-reference resolution)
َّبنيامي؟
[واُلاَق ذإُفُسوُيَلَوُهوُخَأةَبُصع ُنحَنَو اَّنم اَنيبَأ ىَلإ ُّبَحَأينبُم ل ََلَض يفَل اَناَبَأ َّنإ]-المؤمنون14
--
53
54. Proposed Search Features :: Statistical system
Frequencies of different units
How many words of « »هللا in Surah “?”المجادلة
What are the ten most frequently cited words in the whole Qur’an?
How many the word of Sea/ بحر and its derivations are mentioned
in the whole Qur’an?
How many letters in the Surah ?طه
What’s the longest Ayah?
How many Marks of Sajdah in the whole Qur’an? (different
rewayates)
54
55. Discussion of search features
55
To validate Usefulness, Importance and Clarity of
each feature, we’ve launched a survey to gather the
opinions.
We mixed the aimed audience to get high quality
feedbacks from :
Regular users,
Quran scholars,
Arabic morphology experts,
Natural Language Processing /Information Retrieval
researchers,
philosophers , working on religious scriptures comparing.
55
59. Conception
59
Previous Work:
the Engineer degree graduation project entitled “Development of a
search and indexing engine for Qur’anic documents” [Dahmani2010]
Improvements:
Moving into a Full vocalized search engine
Customization of text processing phases, considering both uthmani and
standard scripts
Adopting the Quranic word as a search unit
59
60. Conception :: Full Vocalized Search Engine
60
Barriers:
Comparing vocalized, partially vocalized, and unvocalized texts
Distinguishing between original vowels and declension case
markers
Lack of vocalized Arabic linguistic resources
Texts, ontologies, thesauruses, corpuses
Advantages:
Lift the ambiguities caused by ignoring vocalizations
Make searching results, suggestions, and statistics more accurate.
Refine the meanings detection
( a first step in the semantic approach )
60
61. Conception :: Text processing
61
We consider both standard script and uthmani script to
resolve difficulties such as:
Searching with an Uthmani writing form of a word.
Calculating statistics knowing based on the uthmani writing.
Matching the same Word-By-Word structure of some Quranic
linguistic resources
61
63. Conception :: Text processing :: Substitution
63
New phase! Purpose?
Cases of substitution:
Romanization:
Guessing policy:
Nature of used characters
Arabic valid words
Word existence in Quran
Predefined priorities
Numbers as words:
Rules:
We don’t say رجل ,صفر we say رجل ل
One never mentioned as واحد but as احدا
Some numbers accept gender: اثناناثنتان
Other numbers change their forms in the opposite gender of the count noun:
سماوات ,سبع سبعةاابحر
A hundred مئة had a special writing in Quran: ُةَئاِم
Some numbers mentioned indirectly: َفْلا_ٍةَنَس_ِٕلا_ََِّيسَْمخ_اًماَع
63
64. Conception :: Text processing :: Tokenization
64
Phases:
Phrases to words (tokens)
Words to their parts (Sub-tokens)
64
66. Conception :: Text processing :: Normalization
66
Normalize Uthmani text into Standard text
Strip all recitation marks
keep the vowels except the declension case ending vowel
66
67. Conception :: Text processing :: Filtering stop words
67
Stop-words selection strategy:
Chosen from the list of the most frequent words in
Qur’an,
Considering vocalization
Preferring:
Particles such as َّلك
Pronouns such as َأنت
Clitics such as ـَف
67
68. Conception :: Text processing :: Stemming
68
We proposed stripping the affixes in tokenization
In Stemming, we bring the word back either to:
ROOT: Large set of words, different meaning
STEM: Smaller set of words, similar meaning
68
69. Conception :: Quranic Word as Search Unit
69
Purpose: obtain a quick efficient stable method to retrieve specific
Quranic words.
Requirements:
A Quranic words corpus , enriched with linguistic annotations
Word occurance as a unit
Word form as a unit
Information Schema:
Identifiers: a global identifier, a secondary identifier based on the order in
the ayah added to ayah identifier and surah identifier;
Different forms: Uthmani vocalized word (the main form), Standard
vocalized word, Standard unvocalized word;
Transliterations: ISO233, Buckwalter, Arabtex;
Translations: English, other languages;
Different levels of stemming: Lemma, Stem, Root;
Other properties: Part Of Speech, type, state, case, mood, voice, number,
gender, person.
69
70. Conception :: 2-steps search strategy
70
1st step: retrieving the best keywords set based on the user
query by searching in:
A word-as-a-unit index
A Quranic words ontology
2nd step: retrieving the corresponding ayahs using the
keywords set resulted from the first step
70
72. Conception :: Word Search :: Word properties
72
Objective: allow the users to locate ayahs based on
linguistic properties of words such as POS, type,
state, case, mood, voice, number, gender, person.
Methods:
Fielded search:
A fielded search is an advanced query feature that enables
users to select and associate the different document fields to
which he wishes to limit the query, then use the required
keywords within these fields.
72
73. Conception :: Word Search :: Semantically Related Words
73
Objective: offer the related words of a keyword
entered by the user.
Algorithm:
The user specifies:
The word
The semantic relation: Synonymy, Antonymy, Hypernymy,
Hyponymy, Meronymy, Holonymy, Troponymy.
Inquiring the ontology for related words
Using those keywords to retrieve the corresponding
ayahs.
73
74. Conception :: Word Search :: Multi-level Derivations
74
Objective: get a set of words that
share the same origin such as
stem and root.
Algorithm:
The user specify:
the keyword
The a level of derivation.
Recovering the origin of the word in
the specified derivation level
Retrieving all the set of words that
share this origin.
74
75. Conception :: Word Search :: Specific Derivations
75
Objective: find the words resultants of
applying a specific derivation operation on
the user given word.
Algorithm:
The user should:
Enter the keyword
Specify which derivation.
Generating the set of derived words either by:
fetching in the word index
using linguistic tools such as verb conjugators. be
filtered as a second step by intersection with the
set of Quranic words.
The resulted set will be used to locate the
corresponding ayahs.*
75
76. Conception :: Word Search :: Fuzzy Search
76
Objective: fetch using the set of words that are nearly similar to the
input word in writing or pronunciation.
Methods:
Liechtenstein distance (previously unknown text)
Ngrams
Spell-checker
Soundtex (Phonetic )
76
78. Open Source but WHY?
There are a number of advantages lead us to open source, the following
points examine the most important of these[Web-Oss-watch]:
Collaborative bug-fixing & Fast security vulnerabilities detection
>Given enough eyeballs, all bugs are shallow<
-- an open source slogan
Customization.
Translation & Localization.
Development discontinuation.
Being part of a community.
Low cost.
78
79. Used Technologies :: Python
Python is a powerful dynamic programming language, used widely.
Features:
powerful and fast
plays well with others
runs everywhere
friendly and easy to learn
Free Open
79
80. Used Technologies :: Whoosh API
Whoosh is a full-text indexing and searching library
implemented in Python
Features:
Pure Pythonic API
Fielded indexing and search
Fast indexing and retrieval
Powerful query language
Useful for circumstances such as:
Anywhere a pure-Python solution is desirable to avoid having to
build/compile native libraries
As a research platform (Python is easier to read!)
When the search features are more important to us than the raw speed.
80
81. Implementation :: Previous Code Base
81
Implemented on [Chelli&Dahmani2010]
Licensed under GPL*
(Server applications issue)
Based on Whoosh Indexing Library
Offering Many Search Operations
Results in HTML format
Raw format
Can be used in Python
Requires to write wrappers for other languages
A basic resource manager
Has a missing piece
81
82. Implementation :: Our improvements
82
The code base:
has had 981 commits made
representing 15,243 lines of code
mostly written in Python
with a well-commented source code.
took an estimated 4 years of effort (COCOMO model)
Reference: Ohloh Website.
82
83. Implementation :: Our improvements
New Output System
83
A New Output System:
JSON-Based ==> Simpler & more extensible
Centralized ==> Changes on one & only one place
Extended & Extensible Results Structure
Customizable Search Request using flags
Including a Statistic Calculating Unit
Offering Meta-Data for request
83
85. Implementation :: Our improvements
Many new features
85
Fuzzy Search Feature
Retrieving the neighbors of each ayah
85
86. Implementation :: Our improvements
Many new features (2)
86
Manipulating different Quranic Scripts
More suggestion operations
Showing the linguistic annotations
Retrieving & Showing transliterated keywords (Buckwalter)
86
87. Implementation :: Our improvements
Resources Importing Manager
87
Resources Importing Manager:
Downloading original resources (Licensing issue)
Parsing & Importing the data to our intermediate database
Indexing the database
Updating auto-generated data files
87
88. Implementation :: Our improvements
Packaging System
88
Automating the API building
Packaging into:
Source Tarball
Binary Tarball
Python egg package
Debian deb package
Red-hat rpm package
Windows Installer
Mac OS (Perspective)
88
89. Implementation :: Our improvements
->More<-
89
Coding Standardization
Following Python Conventions (PEP8)
Using Pylint (a source code bug and quality checker)
Documentation Covering
Enriching the code with Readme files
New Console interface
89
90. Implementation :: Open Issues
90
Implement the modularity for the Query Parser: This is
important to enable the extensibility feature and fix the problem of
mixing (the combination) the different operations made during
parsing.
Restrict the anonymous requests to the API: restricting
requests protect the API from flooding either intended or not. This
can be done by:
Limit the maximum of simultaneous requests globally and by IP.
Implement an identification system that works with remote clients.
Move to the last version of Whoosh library: Whoosh is
almost in the version 3.X in its stable release while we still using an
older version which is 0.3. The moving to the last version is very
recommended to benefit of the improvements made. Though, it will
not be an easy operation since our API is intertwined with the older
version. Especially for the Query Parser.
90
92. Implementation :: Open Issues
92
Complete the features implementation
Enrich the linguistic resources
Implement the modularity for the Query Parser
Restrict the anonymous requests to the API
Move to the main stream of Whoosh library
Maintain compatibility between Python versions
Cover with documentation
Optimize code and performance
92
93. Implementation :: Open Issues
93
Enriching the linguistic resources: the actual
used resources are poor comparing to what we really
need.
Integrate Qurany project to enrich the actual faceted thematic
search.
Integrate the boundary annotations to enable the retrieving of
boundaries in Quran.
Propose a standard format for new linguistic and Quranic
resources.
Textify the binary database to enable the possibility of logging
of changes and take the benefits of revision control systems
such as GIT.
93
94. Implementation :: Open Issues
94
Complete the features implementation
Fielded search YES
Logical relations YES
Phrase search YES
Interval search YES
Full Regex NO
Wildcards PARTIALLY
Boosting keywords YES
Pagination YES
Scoring YES
Sorting YES
Keywords Highlight YES
Uthmani full marks YES
Real time output NO
Results grouping NO
Spell correction PARTIALLY
Related keywords PARTIALLY
Different vocalizations YES
Collocated words NO
Keyboard mapping NO
Different significations NO
94
Romanization PARTIALLY
Partial vocalization PARTIALLY
Multi-level derivation YES
Syntactic Coloration NO
Vocal Search NO
Specific-derivations NO
Linguistic annotations PARTIALLY
Fuzzy string PARTIALLY
Word properties PARTIALLY
Linguistic examples NO
Structural options YES
Translation search YES
Uthmani writing way NO
Recitation marks PARTIALLY
Divine Names Highlight NO
Repetitions&Allegoricals NO
Abrogators&Abrogated NO
Qur’anic Parables NO
Semantically related words PARTIALLY
Faceted Thematic Search PARTIALLY
Entity Extraction NO
Questions Answering (QA) NO
Automatic vocalization NO
Co-reference resolution NO
Vocalized word frequency YES
Unvocalized word frequency YES
Another Qur’anic units frequency NO
Root/Stem/Lemma frequency NO
95. Implementation :: Open Issues
95
Move to Python 3.X: Python 2 is disappearing and sooner
or later it’ll be fully replaced. There are many tools offer some
automatic scripts to convert a code from 2 into 3. Though, the
big part often should be done manually.
Cover with documentation: the documentation is so
important, it’s expensive but it encourages the community to
involve in the project. This can be done by:
Enrich the readme files;
Enrich the code with appropriate comments;
Create a usage How-To and straighten it with many demos;
The man page for the console interface.
Optimize code and performance: proceed the fixing of
pylint code analysis warnings and use Profile to check the
performance of each search feature in order to improve it.
95
101. Examples of use
As a desktop application
As a web interface
www.alfanous.org
As a smart phone app
iPhone, iPad
Windows phone
101
102. Examples of use :: Alfanous.org
102
Remarkable Features:
Localizable
Awarded:
As the best-in-technicality website
in Algeria Web Awards 2012
103. Examples of use :: Alfanous.org (Responsive)
103
Remarkable Features:
User experience
Responsiveness
Simplicity
Awarded:
chosen as the best website categorized
under the religious websites in Algeria Web
Awards 2013
103
104. Examples of use :: iPhone Application
104
Developed by:
iPhone-islam (objective-C)
Remarkable Features:
running on iPhone and iPad series
104
105. Examples of use :: Windows phone APP
105
Developed by:
Moumen bou Abdellah (C#)
Remarkable Features:
Running on windows phone
105
106. Examples of use :: Alfanous Desktop Interface
106
Remarkable Features:
Offline use
106
107. Conferences
1. An Arabic paper in NITS 2011 KSA:
Title: An Application Programming Interface for indexing and
search in Noble Quran
Authors: Assem Chelli, Merouane Dahmani, Amar Balla, Taha
Zerrouki.
2. An English paper in a pre-conference workshop in
LREC 2012 Turkey which is about ”LRE-Rel:
Language Resource and Evaluation for Religious
Texts”
Title: Advanced Search in Quran: Classification and Proposition
of All Possible Features.
Authors: Assem Chelli, Amar Balla, Taha Zerrouki.
107
108. Conclusion & Perspectives
108
We went through the implementation of many search
features that we previously enlisted.
Unfortunately, there are more improvements to be done
and many issues to be resolved. We left them as
perspectives:
Achieving an accurate statistics gathering system;
Implementation of a more adequate suggestion system;
Clear the way toward a semantic search engine;
Proceeding the full conception of all search features.
Complete implementation of all open issues.
108
Assalamu Alaikom, Mr. the president,, Missis and misters the members of the committee, my Family, my friends , everybody here…I am Assem Chelli and you are welcome to my presentation defending my thesis entitled “Proposal of an Advanced Retrieval System for Noble Qur’an”.
Here is the plan of the presentation,
I start with an introduction to the idea and then I explain the source of problem.
Comme l’indexation, la recherche est un processus à plusieurs étapes, comme le montre la figure.
Each verb has its set of associated deverbal forms which it maintains morphological, syntactic and semantic relations. The number and nature of these forms vary depending on the status of the verb. We cite some deverbal forms:
Marks of Waqf are used to knowing when to take a break during the recitation of
the Qur’an, these marks are different to distinguish the type of waqf that can be:
allowed, preferred, prohibited, etc. Note that the marking of Waqfs differs between
rewayates ( الروايات ). The figure shows a reference of waqfs based on rewayate of Kaloun
المصاحف are written based on the Uthmani script which is quite different to the standard arabic script., for example: the word "سأريكم" is written on Uthmani as سأوريكم” with an additional letter waw
These sciences hanging the Qur’an as a subject for study to illustrate and explore its
secrets, some books are found under different names such as Revelation Science ( علم
التنزيل ) and Book Science ( علم الكتاب ),
The indexes are catigorized by purpose on 5 main categories:
///
Syntactic, Semantic, Structural, Statistical, Thematic
Our proposal is about design a retrieval system that fit perfectly the Qur’an search needs.
But to realize this objective, we must first list and classify all the search features that are possible and helpful.
Then we need to study how to implement each feature and what is its requirements.
Améliorations de sortie
Pagination
10,20,50… résultats par page
Tri
l’ordre de mushaf
l’ordre de révélation
l’ordre par pertinence
l’ordre numérique et alphabétique des champs
Améliorations:
Vérifier le vrai ordre des symbole arabes:
Ordre de Hamza: ؤ ئ ء أ
Order de Ta’ : ة ت
Order de Alef : ى ا
Sur-lignage (Highlight)
الحمد <style>لله </style>رب العالمين
Affichage en Temps-réelle
Groupage des resultats
par similitudes
par sourates
par sujets
par phrases
par exemples coraniques
par dépendances en taffsir
par raisons de révélations
…
Affichage en script Uthmani plein de marques
[ ۞ لَّقَدْ كَانَ فِى يُوسُفَ وَإِخْوَتِهِۦٓ ءَايَٰتٌ لِّلسَّآئِلِينَ]
Les Systèmes de suggestion:
Suggestion de mots-clés alternatifs
أبراهام: إبراهيم
Améliorations:
Régler les limitations des N-grammes pour Les mots vocalisés
Suggestion de mots-clés apparentés basé sur une ontologie
يعقوب : يوسف، الأسباط، نبي ...
Suggestion de différents vocalisations d’un mot
الملك : المَلِك ، المُلْك، المَلَك ...
Suggestion des mots colloqués
سميع : سميع عليم، سميع بصير
الحمد : الحمد لله
Les Systèmes de suggestion:
Suggestion de mots-clés alternatifs
أبراهام: إبراهيم
Améliorations:
Régler les limitations des N-grammes pour Les mots vocalisés
Suggestion de mots-clés apparentés basé sur une ontologie
يعقوب : يوسف، الأسباط، نبي ...
Suggestion de différents vocalisations d’un mot
الملك : المَلِك ، المُلْك، المَلَك ...
Suggestion des mots colloqués
سميع : سميع عليم، سميع بصير
الحمد : الحمد لله
We launch the survey for 45 days and we gathered about 37 takers. The following pie
charts describe the variations between the background of the participants including
Age, Gender, Country, Language.
We’ve gathered the information about the experience of the participants in four
axes: Quran, Arabic, Linguistics, Computing. This following chart describes this:
We got very helpful results from launching the survey. The following figure describes
the percentage of clarity, usefulness, and need of each search feature listed
We’ve started the work on the idea of search in Quran in the Engineer degree graduation
project entitled ”Development of a search and indexing engine for Qur’anic
documents”1[Dahmani2010].
The Quranic text written on standard script but due to the difficulties caused by the its differences with Othmani script, we have to consider both scripts for indexing and search. Among those difficulties, we mention
As we’ve said, to resolve those difficulties we consider Othmani text also for text
processing along with standard text. In addition, we propose many improvements on
the text processing phases to achieve many of search features
This is a new phase that we propose to be added before tokenization. Its objective is to
identify a list of pre-defined patterns and replace them as a preparation for tokenization.
Previously, we have considered the ayah as the search unit. The ayah being the unit is yet the perfect choice. However, to attend many linguistic features ,We need to consider a different search unit: the Quranic word ( اللفظ القرآني ).
We use the word search to improve the ayah search by introducing a 2-steps search
strategy.
Building on the 2-steps strategy we propose those new operations to be implemented
in the query parser:
Le projet sera Open Source Mais pour qoui?, Il ya certain nombre d'avantages que l'open source offre plus de source fermé, cette diapo examiner les plus importantes de celles-ci:
nous pouvons recevoir les feedbacks de la communauté open source et ceci rend plus facile la fixation des bogues et accélère la
tâche de développement. l’Open source protègent également le projet de l’abandon et
réduire le coût de mise en œuvre.
Pour atteindre nos objectives, nous supposons d’utiliser certaines technologies pour l’implémentation:
Python is a remarkably powerful dynamic programming language that is used in a wide
variety of application domains.
Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python
Those are the main milestones that we went through:
Those are the main milestones that we went through:
Those are the main milestones that we went through:
The implementation is working as a library offering many interfaces. The main one is the
Application Programming Interface or API. It works as the intermediary between the
library and the other interfaces. There are two low-level interfaces that works with the
API:
1. Console interface, destined for test purposes and to be used by third party non-
pythonic desktop interfaces.
2. JSON web service, destined to be used by web interfaces, smart phone apps, and
social network apps.
We went through many improvements. Yet, however, there still lot of things to be
done. We’ll browse the main milestones that should go through:
We went through many improvements. Yet, however, there still lot of things to be
done. We’ll browse the main milestones that should go through:
The implementation is working as a library offering many interfaces. The main one is the
Application Programming Interface or API. It works as the intermediary between the
library and the other interfaces. There are two low-level interfaces that works with the
API:
1. Console interface, destined for test purposes and to be used by third party non-
pythonic desktop interfaces.
2. JSON web service, destined to be used by web interfaces, smart phone apps, and
social network apps.
An application programming interface (API) is a protocol intended to be used as
an interface by software components to communicate with each other. An API is
a library that may include specification for routines, data structures, object classes,
and variables.
1. Free Open Libre: any one can use it, any one can contribute in. That means
it takes the advantage of community involvement.
2. A Python API: that allows anyone to create independently a web interface,
desktop interface , Android/Iphone/Windows phone interfaces , facebook/twitter/G+
applications ...and so on.
3. A founded base: The search process is too fast and too stable other web-
sites/applications do.
4. Lot of features: The actual API has an important number of features and
prepared to accept more.
An application programming interface (API) is a protocol intended to be used as
an interface by software components to communicate with each other. An API is
a library that may include specification for routines, data structures, object classes,
and variables.
1. Free Open Libre: any one can use it, any one can contribute in. That means
it takes the advantage of community involvement.
2. A Python API: that allows anyone to create independently a web interface,
desktop interface , Android/Iphone/Windows phone interfaces , facebook/twitter/G+
applications ...and so on.
3. A founded base: The search process is too fast and too stable other web-
sites/applications do.
4. Lot of features: The actual API has an important number of features and
prepared to accept more.
To enable the use of our API over the web, we made a web service that wrap the
input/output of the API. The request arguments should be passed in URL and the
output will be generated and shown in JSON format. This could be used by web interfaces, smart phone apps, social network apps, and browsers addons.
As a test interface, we made a console interface that works on command line. This
interface could be used also as a wrapper to make desktop interfaces that are developed
under a programming language different then Python. The request should be passed
as in-line arguments in the command line and the output will be generated & shown in
JSON format. High-activity desktop interfaces, working on a linux-like platform, can
run this interface as a Daemon service on the background.