SlideShare a Scribd company logo
1 of 47
By/
Quantitative Linguistics
PHD Course
2017
Introduction
Aim of the work
Method
Results
Conclusion
Quantitative linguistics is
the comparative study of
the frequency and
distribution of words and
syntactic structures in
different texts.
The aim of qualitative
analysis is a complete,
detailed description of the
linguistic data in order to
describe the linguistic
features & phenomena which
are identified in the data.
Those linguistic features
were classified, counted,
and even constructed
more complex statistical
models in an attempt to
explain what is observed.
Quantitative findings can
be generalized to a larger
population, and direct
comparisons can be made
between two corpora.
Thus, quantitative analysis allows
us to discover which phenomena
are likely to be genuine
reflections of the behavior of a
language.
 to make a comparative quantitative
linguistic analysis of the most common
Egyptian aphorisms between two
corpora by using the statistical system
of R language.
 The comparison is done between:
Corpus in
Arabic language Corpus in
English language
 The linguistic data which collected for the
two corpora are sentences from the
Egyptian aphorisms, from the online
database of Egyptian aphorisms.(Total = 100)
50 sentences of
Egyptian
aphorisms in
Arabic language
50 sentences of
Egyptian
aphorisms in
English language
 Analyzing the two corpora is done through to steps:
• by using the online Stanford Parser for
marking the part of speech tagging (which
refer to a syntactic function) of each word of
the included two corpora, as well as,
measuring the number of tokens & the taken
time for tagging the words of each aphorism.
First
Automatically
• by adding the descriptions and inflections of
each part of speech tagging with the aid of
the list of the parts of speech encoded in the
annotation system of the Penn Treebank
Project
• Also, analyzing the animacy (animate /
inanimate) and the gender (masculine /
feminine) of each annotated word of the two
corpora of the Egyptian aphorisms.
Second
Manually
 Includes converting and tabulating all the
analyzed data into excel sheet to be accepted
and read by R language as CSV file.
CSV
a1
a2
e1
e2
 Running the manipulated data and
making the statistical measurements, to
find the linguistic features of both
corpora (Arabic & English) by using R
language and investigate the
quantitative linguistic (lexical)
characteristics of Egyptian aphorisms in
Arabic and English languages.
 In order to extract the quantitative linguistic
characteristics of Arabic and English corpora of
Egyptian aphorisms compare between them; this
section is divided into two subsections:
• contains the statistical
measurements that is done
by R statistics.
1) Statistical
measurements
by R:
• contains the visualization of
the output of the R results
by R graphics.
2) Visualizing
the output by
R:
 For visualizing the word length in English
corpus and in Arabic corpus:
• barplot(xtabs(~ee$Length), xlab= "word length in
English corpus", col= "grey")
• barplot(xtabs(~aa$Length), xlab= "word length in
Arabic corpus", col= "grey")
 For visualizing the number of tokens of each
sentence of the query in English corpus and in
Arabic corpus:
• barplot(xtabs(~e$Tokens), xlab= "English corpus
tokens numbers", col= "grey")
• barplot(xtabs(~a$Tokens), xlab= "Arabic corpus
tokens numbers", col= "grey")
 For visualizing the query length in English
corpus and in Arabic corpus, the following
codes are used by R language:
• barplot(xtabs(~e$Length), xlab= "English query
length", col= "grey")
• barplot(xtabs(~a$Length), xlab= "Arabic query length",
col= "grey")
 For visualizing the Animacy of each word in the
query in English corpus and in Arabic corpus,
the following codes are used by R language:
• barplot(xtabs(~ee$Animacy), xlab= "Animacy in
English corpus", col= "grey")
• barplot(xtabs(~aa$Animacy), xlab= "Animacy in Arabic
corpus", col= "grey")
 For visualizing the Gender of each word of the
query in English corpus and in Arabic corpus,
the following codes are used by R language:
• barplot(xtabs(~ee$Gender), xlab= "Gender in English
corpus", col= "grey")
• barplot(xtabs(~aa$Gender), xlab= "Gender in Arabic
corpus", col= "grey")
 For visualizing tokens numbers in English
corpus and in Arabic corpus, the following
codes are used by R language:
• truehist(e$Tokens, col="lightblue", xlab="English
tokens numbers")
• truehist(a$Tokens, col="lightblue", xlab="Arabic
tokens numbers")
The mean length of an Egyptian Aphorism in
the Arabic corpus (48.18 per letter) is lesser
than its counterpart in English corpus (50.66
per letter), which means that the Egyptian
aphorism in English language is longer than
its counterpart in Arabic language.
The number of tokens that used in expressing
an Egyptian aphorism in Arabic language is
more than the number of tokens which used in
expressing its English counterpart. Also, the
mean number of tokens of an Arabic Egyptian
aphorism (mean number of tokens = 10.24) is
greater than in an English Egyptian aphorism
(mean number of tokens = 9.9).
Minimum and maximum numbers of tokens for
both corpora are quite the same, whereas the
minimum number of tokens is 4 tokens for both
corpora, and the maximum number of tokens is
22 for Arabic corpus & 21 for English corpus.
The mean of words length in Arabic corpus
(3.794922) is lesser than in English corpus
(4.179798), wherein, the range of Arabic words
in Arabic corpus is from 1 to 8, and in English
corpus is from 1 to 14. Although the median
length of words is the same in both corpora
(which calculate 4 letters in both corpora).
According to the most general frequent words
in both corpora of Egyptian aphorisms, in
Arabic corpus, the words ‫من‬,‫ال‬,‫و‬,‫ان‬) ;
respectively from right to left) are the most
frequent words. Whereas, in English corpus the
words (the, is, you, of; respectively from left to
right) are the most frequent words.
Regarding the tag set of Egyptian aphorisms of
both corpora, NN (Noun) are the most frequent
tag for both corpora followed by DT
(Determiner), followed by Verbs and
Prepositions.
Final quantitative analysis of egyptian aphorisms by using r

More Related Content

What's hot

What's hot (9)

P99 1067
P99 1067P99 1067
P99 1067
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Diachronic change in causal cohesive devices in translated and non-translated...
Diachronic change in causal cohesive devices in translated and non-translated...Diachronic change in causal cohesive devices in translated and non-translated...
Diachronic change in causal cohesive devices in translated and non-translated...
 
Closure properties of context free grammar
Closure properties of context free grammarClosure properties of context free grammar
Closure properties of context free grammar
 
Sslis
SslisSslis
Sslis
 

Similar to Final quantitative analysis of egyptian aphorisms by using r

Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
CSCJournals
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
DhruvKushwaha12
 

Similar to Final quantitative analysis of egyptian aphorisms by using r (20)

Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Arabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnetArabic words stemming approach using arabic wordnet
Arabic words stemming approach using arabic wordnet
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
A new hybrid metric for verifying
A new hybrid metric for verifyingA new hybrid metric for verifying
A new hybrid metric for verifying
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
 
The Arabic Speech Database: PADAS
The Arabic Speech Database: PADASThe Arabic Speech Database: PADAS
The Arabic Speech Database: PADAS
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
An Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation SystemAn Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation System
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library SystemAdopting Quadrilateral Arabic Roots in Search Engine of E-library System
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 

Final quantitative analysis of egyptian aphorisms by using r

  • 2. Introduction Aim of the work Method Results Conclusion
  • 3.
  • 4. Quantitative linguistics is the comparative study of the frequency and distribution of words and syntactic structures in different texts. The aim of qualitative analysis is a complete, detailed description of the linguistic data in order to describe the linguistic features & phenomena which are identified in the data. Those linguistic features were classified, counted, and even constructed more complex statistical models in an attempt to explain what is observed. Quantitative findings can be generalized to a larger population, and direct comparisons can be made between two corpora. Thus, quantitative analysis allows us to discover which phenomena are likely to be genuine reflections of the behavior of a language.
  • 5.
  • 6.  to make a comparative quantitative linguistic analysis of the most common Egyptian aphorisms between two corpora by using the statistical system of R language.  The comparison is done between: Corpus in Arabic language Corpus in English language
  • 7.
  • 8.  The linguistic data which collected for the two corpora are sentences from the Egyptian aphorisms, from the online database of Egyptian aphorisms.(Total = 100) 50 sentences of Egyptian aphorisms in Arabic language 50 sentences of Egyptian aphorisms in English language
  • 9.  Analyzing the two corpora is done through to steps: • by using the online Stanford Parser for marking the part of speech tagging (which refer to a syntactic function) of each word of the included two corpora, as well as, measuring the number of tokens & the taken time for tagging the words of each aphorism. First Automatically • by adding the descriptions and inflections of each part of speech tagging with the aid of the list of the parts of speech encoded in the annotation system of the Penn Treebank Project • Also, analyzing the animacy (animate / inanimate) and the gender (masculine / feminine) of each annotated word of the two corpora of the Egyptian aphorisms. Second Manually
  • 10.  Includes converting and tabulating all the analyzed data into excel sheet to be accepted and read by R language as CSV file. CSV a1 a2 e1 e2
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.  Running the manipulated data and making the statistical measurements, to find the linguistic features of both corpora (Arabic & English) by using R language and investigate the quantitative linguistic (lexical) characteristics of Egyptian aphorisms in Arabic and English languages.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.  In order to extract the quantitative linguistic characteristics of Arabic and English corpora of Egyptian aphorisms compare between them; this section is divided into two subsections: • contains the statistical measurements that is done by R statistics. 1) Statistical measurements by R: • contains the visualization of the output of the R results by R graphics. 2) Visualizing the output by R:
  • 34.
  • 35.
  • 36.
  • 37.  For visualizing the word length in English corpus and in Arabic corpus: • barplot(xtabs(~ee$Length), xlab= "word length in English corpus", col= "grey") • barplot(xtabs(~aa$Length), xlab= "word length in Arabic corpus", col= "grey")
  • 38.  For visualizing the number of tokens of each sentence of the query in English corpus and in Arabic corpus: • barplot(xtabs(~e$Tokens), xlab= "English corpus tokens numbers", col= "grey") • barplot(xtabs(~a$Tokens), xlab= "Arabic corpus tokens numbers", col= "grey")
  • 39.  For visualizing the query length in English corpus and in Arabic corpus, the following codes are used by R language: • barplot(xtabs(~e$Length), xlab= "English query length", col= "grey") • barplot(xtabs(~a$Length), xlab= "Arabic query length", col= "grey")
  • 40.  For visualizing the Animacy of each word in the query in English corpus and in Arabic corpus, the following codes are used by R language: • barplot(xtabs(~ee$Animacy), xlab= "Animacy in English corpus", col= "grey") • barplot(xtabs(~aa$Animacy), xlab= "Animacy in Arabic corpus", col= "grey")
  • 41.  For visualizing the Gender of each word of the query in English corpus and in Arabic corpus, the following codes are used by R language: • barplot(xtabs(~ee$Gender), xlab= "Gender in English corpus", col= "grey") • barplot(xtabs(~aa$Gender), xlab= "Gender in Arabic corpus", col= "grey")
  • 42.  For visualizing tokens numbers in English corpus and in Arabic corpus, the following codes are used by R language: • truehist(e$Tokens, col="lightblue", xlab="English tokens numbers") • truehist(a$Tokens, col="lightblue", xlab="Arabic tokens numbers")
  • 43.
  • 44. The mean length of an Egyptian Aphorism in the Arabic corpus (48.18 per letter) is lesser than its counterpart in English corpus (50.66 per letter), which means that the Egyptian aphorism in English language is longer than its counterpart in Arabic language. The number of tokens that used in expressing an Egyptian aphorism in Arabic language is more than the number of tokens which used in expressing its English counterpart. Also, the mean number of tokens of an Arabic Egyptian aphorism (mean number of tokens = 10.24) is greater than in an English Egyptian aphorism (mean number of tokens = 9.9).
  • 45. Minimum and maximum numbers of tokens for both corpora are quite the same, whereas the minimum number of tokens is 4 tokens for both corpora, and the maximum number of tokens is 22 for Arabic corpus & 21 for English corpus. The mean of words length in Arabic corpus (3.794922) is lesser than in English corpus (4.179798), wherein, the range of Arabic words in Arabic corpus is from 1 to 8, and in English corpus is from 1 to 14. Although the median length of words is the same in both corpora (which calculate 4 letters in both corpora).
  • 46. According to the most general frequent words in both corpora of Egyptian aphorisms, in Arabic corpus, the words ‫من‬,‫ال‬,‫و‬,‫ان‬) ; respectively from right to left) are the most frequent words. Whereas, in English corpus the words (the, is, you, of; respectively from left to right) are the most frequent words. Regarding the tag set of Egyptian aphorisms of both corpora, NN (Noun) are the most frequent tag for both corpora followed by DT (Determiner), followed by Verbs and Prepositions.