Arabic Syntactic parsing
By Amena Helmy
Introduction to parsing
POSTagging
‫كبيرا‬
‫صفة‬
‫كتابا‬
‫اسم‬
‫محمد‬
‫علم‬ ‫اسم‬
‫قرأ‬
‫فعل‬
3
POS tagging • Give information about the
individual words
Syntactic parsing
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
4
Syntactic parsing
• Whole sentence
• The overall structure of each sentence
(Tree)
• The way the words relate to each other
SBJ
OBJ
amod
Constituency
Vs Dependency
Constituency Parsing
Words are organized into constituents
Constituents are groups of words that can act
as single units
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
NP NP
S
VP
ConstituencyTest
• Substitution
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
•
‫عن‬ ‫أعلم‬ ‫ال‬
‫ه‬
‫شيئا‬
• Conjunction
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
‫وصديقه‬
Head of constituents (phrases)
‫هاما‬ ‫أمرا‬ ‫يكون‬ ‫قد‬
VP
‫يكون‬
‫الذي‬ ‫الرجل‬
‫اشترى‬
‫البيت‬
NP
‫الرجل‬
‫النهر‬ ‫ضفتي‬ ‫على‬
PP
‫على‬
For constituents, we usually name them as phrases based on the
word that heads the constituent (The most important word)
Dependency structure
All nodes are words.
Relations between words are
shown through directed arc that
goes from the head to dependent.
The type of relation is manifested
through arc labels
SBJ
OBJ
amod
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
Parsing
importance
Parsing importance
Parsing is the task of uncovering the
syntactic structure of language and is often
viewed as an important prerequisite for
building systems capable of understanding
language
Syntactic structure is necessary as a first
step towards some NLP tasks
English Chinese
Why Parsing is
challenging
1. Real sentences are long
•
‫و‬
‫خلص‬
‫الى‬
‫ان‬
‫جدول‬
‫اعمال‬
‫اللقاء‬
‫الحالي‬
‫يتضمن‬
:
‫العمل‬
‫على‬
‫تنظيم‬
‫الري‬
‫في‬
‫شكل‬
‫مدروس‬
‫يهدف‬
‫الي‬
‫االفادة‬
‫الى‬
‫اقصى‬
‫حد‬
‫من‬
‫الثروة‬
‫المائية‬
‫التي‬
‫يوفر‬
‫ها‬
‫نهر‬
‫الدامور‬
‫لري‬
‫كل‬
‫االراضي‬
‫الزراعية‬
‫التي‬
‫تنتفع‬
‫بالري‬
‫من‬
‫مي‬
‫اهه‬
,
‫علما‬
‫ان‬
‫الطول‬
‫االجمالي‬
‫للمجاري‬
‫الدائمة‬
‫ل‬
‫ل‬
‫نهر‬
‫هو‬
‫حوالى‬
76
‫كيلومترا‬
‫و‬
‫متوسط‬
‫صبيبها‬
‫نحو‬
4.71
‫امتار‬
‫مكعبة‬
‫بالثانية‬
‫و‬
‫أن‬
‫عدد‬
‫ايام‬
‫الهطول‬
‫التقريبي‬
‫هو‬
70
‫يوما‬
‫في‬
‫السنة‬
‫و‬
‫بالتالي‬
‫فإن‬
‫عملية‬
‫حسابية‬
‫بسيطة‬
‫تظهر‬
‫ان‬
‫متوسط‬
‫كمية‬
‫المي‬
‫اه‬
‫التي‬
‫يوفرها‬
‫النهر‬
‫هي‬
28
‫مليون‬
‫متر‬
‫مكعب‬
‫من‬
‫المياه‬
‫سنويا‬
,
‫االمر‬
‫الذي‬
‫يظهر‬
‫اهمية‬
‫هذا‬
‫النهر‬
‫و‬
‫مقدار‬
‫الطاقة‬
‫المائية‬
‫التي‬
‫يمكن‬
‫االفادة‬
‫منها‬
,
‫اضافة‬
‫الى‬
‫حل‬
‫مشكلة‬
‫المؤسسات‬
‫السياحية‬
‫التي‬
‫انشئت‬
‫من‬
‫دون‬
‫رخص‬
‫قانونية‬
‫و‬
‫إعداد‬
‫لوائح‬
‫بالتعديات‬
‫و‬
‫العمل‬
‫على‬
‫إزالتها‬
‫و‬
‫منع‬
‫استخدام‬
‫الكهرباء‬
‫بهدف‬
‫الصيد‬
‫في‬
‫مياه‬
‫النهر‬
‫محافظة‬
‫على‬
‫االسماك‬
‫ال‬
‫تي‬
‫تتوالد‬
‫و‬
‫تتكاثر‬
‫فيه‬
‫و‬
‫منع‬
‫اقامة‬
‫شبكات‬
‫صرف‬
‫صحي‬
‫تلوث‬
‫مجرى‬
‫النهر‬
,
‫العمل‬
‫على‬
‫معالجة‬
‫مياه‬
‫المصانع‬
‫على‬
‫اختالفها‬
‫قبل‬
‫ان‬
‫تصب‬
‫في‬
‫النهر‬
،
‫و‬
‫توعية‬
‫المزارعين‬
‫على‬
‫موضوع‬
‫المبيدات‬
‫الزراعية‬
‫و‬
‫معالجة‬
‫مشكلة‬
‫البناء‬
‫العشوائي‬
‫ل‬
‫ا‬
‫لحد‬
‫منه‬
‫على‬
‫ضفتي‬
‫النهر‬
‫و‬
‫وضع‬
‫تخطيط‬
‫عمراني‬
‫جديد‬
‫للحوض‬
‫ب‬
‫ا‬
‫لتنسيق‬
‫مع‬
‫مديرية‬
‫التنظيم‬
‫المدني‬
‫ووزارة‬
‫البيئة‬
,
‫وأخيرا‬
‫اقامة‬
‫اتفاق‬
‫حسن‬
‫جوار‬
‫بين‬
‫كل‬
‫البلديات‬
‫التي‬
‫يشملها‬
‫حوض‬
‫نهر‬
‫الدامور‬
‫تكون‬
‫بمثابة‬
‫لجنة‬
‫متابعة‬
‫دائمة‬
‫بإشر‬
‫اف‬
‫سعادة‬
‫القائم‬
‫قام‬
‫بغية‬
‫العمل‬
‫معا‬
‫على‬
‫توفير‬
‫الحماية‬
‫الالزمة‬
‫لهذا‬
‫الشريان‬
‫المائي‬
‫المهم‬
‫و‬
‫المحافظة‬
‫على‬
‫حقوق‬
‫المنتفعين‬
‫بمياهه‬
" .'
2. Ambiguous sentences
I saw the man with
binoculars
Arabic syntactic
challenges
Arabic Complex Morphology
Analyzing s such input
morphologically is not an
easy task to do, but it has
to be done correctly to
pursue to next step which
is processing the input text
syntactically.
and you will watch it
‫وستشاهدونها‬
Arabic is a Pro-drop language
Arabic is a pro-drop
language, where the subject
of a verb may be implicitly
encoded in the verb
morphology.
Free word order
English SV O The boy ate the
food.
Arabic SV O ‫الطعام‬ ‫الولد‬ ‫أكل‬
V S O ‫الطعام‬ ‫أكل‬ ‫الولد‬
Parsing
process
The parsing process
P
A
R
S
E
R
Grammar
sentences
Constituency Dependency
Constituency grammar
Phrase structure grammar
Context free grammar (CFG)
Context Free Grammar
•A Context-free grammar consists of a set of rules or
productions, each expressing the ways the symbols of the
language can be grouped together, and a lexicon of words
CFG Components
1. Set of (Words) Σ
‫يذهب‬
‫صباحا‬
‫إلى‬
‫المدرسة‬
‫الطالب‬
2. N a set of non-terminal symbols.
3- Start symbol S
4. R a set of rules or productions, each of the
form (A → β)
Contextual rules
S  VP
VP  V NP PP NP
NP  DN
NP  N
PP  P NP
4. R a set of rules or productions, each of the
form (A → β)
Contextual rules
S  VP
VP  V NP PP NP
NP  DN
NP  N
PP  P NP
Lexical rules
N ‫صباحا‬
DN ‫الطالب‬
DN ‫المدرسة‬
V ‫يذهب‬
PP  ‫إلى‬
History of CFG Parsing
Classical NLP Parsing (Pre 1990 era)
Parsers have poor coverage
Even quite simple sentences
had many possible analyses
Build CFG grammar
for languages
Solutions
High coverage for rules
We need mechanisms that allow us
to find the most likely parse(s)
Use annotated data
Treebanks appearance
Seems a lot slower and
less useful than building a
grammar
Statistical parsing
• Contextual rules
S  VP
S  VP NP
VP  V NP PP NP
VP  V NP
NP  DN
NP  N
PP  P NP
• Lexical rules
N ‫صباحا‬
DN ‫الطالب‬
DN ‫المدرسة‬
V ‫يذهب‬
PP  ‫إلى‬
[0.4]
[0.6]
[0.3]
[0.7]
[0.5]
[0.5]
[1.0]
[0.4]
[0.3]
[0.3]
[1.0]
[1.0]
P is the set of probabilities associated to rules P (A → β)
We need mechanisms that allow us to find the
most likely parse(s)
How to calculate probability of rules
• P(X |Y ) =
𝐶𝑜𝑢𝑛𝑡 𝑋 𝑌)
𝐶𝑜𝑢𝑛𝑡 (𝑌)
P(VP V) =
𝐶𝑜𝑢𝑛𝑡 𝑉 𝑉𝑃)
𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃)
P(VP  V NP) =
𝐶𝑜𝑢𝑛𝑡 𝑉 𝑁𝑃 𝑉𝑃)
𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃)
113/183
70/183
VP  V 113
VP  V NP 70
[0.6]
[0.4]
Probabilistic CFG
P is the set of probabilities associated to rules P (A → β)
CFG PCFG
Ambiguous sentences
‫جارهم‬ ‫وابن‬ ‫عمر‬ ‫لعب‬
‫المؤدب‬
How to calculate the probability of a tree
• The probability of an entire tree is the product of probabilities for these individual choices.
1.0
1.0
1.0
0.6
0.4
1.0
1.0
1.0
1.0
0.3
0.1
0.6
0.1
0.1
P(T1) = P(S → VP NP) * P(VP → V) * P(V →
‫)لعب‬ *
P(NP → NP CONJ NP) * P(NP → DET + NN) *
P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ )
P(NP → NN ADJP) P(NN → ‫)ابن‬
P(ADJP → NP DET + ADJ) P(NP→NN PRON)
P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬
P(DET + ADJ→ ‫)المؤدب‬
P(T1) = 1.0* 1.0* 1.0*
0.1* 0.3 *1.0* 1.0* 0.1* 0.6 *0.6 *0.1*0.4*1.0*1.0 =
0000432
Cont.
0.1
0.1
0.1
0.1
0.6
0.4
0.1
0.1
0.1
0.3
0.3
0.1
0.1
0.6
P(T2) = P(S → VP NP) * P(VP → V) * P(V →
‫)لعب‬ *
P(NP → NP CONJ NP) * P(NP → DET + NN)
*
P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ )
P(NP → NN ADJP) P(NN → ‫)ابن‬
P(ADJP → NP DET + ADJ) P(NP→NN
PRON)
P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬
P(DET + ADJ→ ‫)المؤدب‬
P(T1) = 1.0* 1.0* 1.0*
0.3* 0.3 *1.0* 1.0* 0.6 *0.1 *0.6
*0.1*0.4*1.0*1.0 =
0000432 0.0001296
The Probabilistic Context PCFG Free
Grammar (PCFG), or the Stochastic Context-
Free Grammar SCFG
• Like a context-free grammar G is defined by four parameters (N, Σ, R, S); a
probabilistic context-free grammar is also defined by four parameters, with a
slight augmentation to each of the rules in R:
• N a set of non-terminal symbols.
• Σ a set of terminal symbols.
• R a set of rules or productions, each of the form (A → β)
• S a designated start symbol.
• P is the set of probabilities associated to rules P (A → β),

Arabic syntactic parsing

  • 1.
  • 2.
  • 3.
  • 4.
    Syntactic parsing ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD 4 Syntactic parsing •Whole sentence • The overall structure of each sentence (Tree) • The way the words relate to each other SBJ OBJ amod
  • 5.
  • 6.
    Constituency Parsing Words areorganized into constituents Constituents are groups of words that can act as single units ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD NP NP S VP
  • 7.
    ConstituencyTest • Substitution • ‫عن‬ ‫شيئا‬‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ • ‫عن‬ ‫أعلم‬ ‫ال‬ ‫ه‬ ‫شيئا‬ • Conjunction • ‫عن‬ ‫شيئا‬ ‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ • ‫عن‬ ‫شيئا‬ ‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ ‫وصديقه‬
  • 8.
    Head of constituents(phrases) ‫هاما‬ ‫أمرا‬ ‫يكون‬ ‫قد‬ VP ‫يكون‬ ‫الذي‬ ‫الرجل‬ ‫اشترى‬ ‫البيت‬ NP ‫الرجل‬ ‫النهر‬ ‫ضفتي‬ ‫على‬ PP ‫على‬ For constituents, we usually name them as phrases based on the word that heads the constituent (The most important word)
  • 9.
    Dependency structure All nodesare words. Relations between words are shown through directed arc that goes from the head to dependent. The type of relation is manifested through arc labels SBJ OBJ amod ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD
  • 10.
  • 11.
    Parsing importance Parsing isthe task of uncovering the syntactic structure of language and is often viewed as an important prerequisite for building systems capable of understanding language Syntactic structure is necessary as a first step towards some NLP tasks English Chinese
  • 12.
  • 13.
    1. Real sentencesare long • ‫و‬ ‫خلص‬ ‫الى‬ ‫ان‬ ‫جدول‬ ‫اعمال‬ ‫اللقاء‬ ‫الحالي‬ ‫يتضمن‬ : ‫العمل‬ ‫على‬ ‫تنظيم‬ ‫الري‬ ‫في‬ ‫شكل‬ ‫مدروس‬ ‫يهدف‬ ‫الي‬ ‫االفادة‬ ‫الى‬ ‫اقصى‬ ‫حد‬ ‫من‬ ‫الثروة‬ ‫المائية‬ ‫التي‬ ‫يوفر‬ ‫ها‬ ‫نهر‬ ‫الدامور‬ ‫لري‬ ‫كل‬ ‫االراضي‬ ‫الزراعية‬ ‫التي‬ ‫تنتفع‬ ‫بالري‬ ‫من‬ ‫مي‬ ‫اهه‬ , ‫علما‬ ‫ان‬ ‫الطول‬ ‫االجمالي‬ ‫للمجاري‬ ‫الدائمة‬ ‫ل‬ ‫ل‬ ‫نهر‬ ‫هو‬ ‫حوالى‬ 76 ‫كيلومترا‬ ‫و‬ ‫متوسط‬ ‫صبيبها‬ ‫نحو‬ 4.71 ‫امتار‬ ‫مكعبة‬ ‫بالثانية‬ ‫و‬ ‫أن‬ ‫عدد‬ ‫ايام‬ ‫الهطول‬ ‫التقريبي‬ ‫هو‬ 70 ‫يوما‬ ‫في‬ ‫السنة‬ ‫و‬ ‫بالتالي‬ ‫فإن‬ ‫عملية‬ ‫حسابية‬ ‫بسيطة‬ ‫تظهر‬ ‫ان‬ ‫متوسط‬ ‫كمية‬ ‫المي‬ ‫اه‬ ‫التي‬ ‫يوفرها‬ ‫النهر‬ ‫هي‬ 28 ‫مليون‬ ‫متر‬ ‫مكعب‬ ‫من‬ ‫المياه‬ ‫سنويا‬ , ‫االمر‬ ‫الذي‬ ‫يظهر‬ ‫اهمية‬ ‫هذا‬ ‫النهر‬ ‫و‬ ‫مقدار‬ ‫الطاقة‬ ‫المائية‬ ‫التي‬ ‫يمكن‬ ‫االفادة‬ ‫منها‬ , ‫اضافة‬ ‫الى‬ ‫حل‬ ‫مشكلة‬ ‫المؤسسات‬ ‫السياحية‬ ‫التي‬ ‫انشئت‬ ‫من‬ ‫دون‬ ‫رخص‬ ‫قانونية‬ ‫و‬ ‫إعداد‬ ‫لوائح‬ ‫بالتعديات‬ ‫و‬ ‫العمل‬ ‫على‬ ‫إزالتها‬ ‫و‬ ‫منع‬ ‫استخدام‬ ‫الكهرباء‬ ‫بهدف‬ ‫الصيد‬ ‫في‬ ‫مياه‬ ‫النهر‬ ‫محافظة‬ ‫على‬ ‫االسماك‬ ‫ال‬ ‫تي‬ ‫تتوالد‬ ‫و‬ ‫تتكاثر‬ ‫فيه‬ ‫و‬ ‫منع‬ ‫اقامة‬ ‫شبكات‬ ‫صرف‬ ‫صحي‬ ‫تلوث‬ ‫مجرى‬ ‫النهر‬ , ‫العمل‬ ‫على‬ ‫معالجة‬ ‫مياه‬ ‫المصانع‬ ‫على‬ ‫اختالفها‬ ‫قبل‬ ‫ان‬ ‫تصب‬ ‫في‬ ‫النهر‬ ، ‫و‬ ‫توعية‬ ‫المزارعين‬ ‫على‬ ‫موضوع‬ ‫المبيدات‬ ‫الزراعية‬ ‫و‬ ‫معالجة‬ ‫مشكلة‬ ‫البناء‬ ‫العشوائي‬ ‫ل‬ ‫ا‬ ‫لحد‬ ‫منه‬ ‫على‬ ‫ضفتي‬ ‫النهر‬ ‫و‬ ‫وضع‬ ‫تخطيط‬ ‫عمراني‬ ‫جديد‬ ‫للحوض‬ ‫ب‬ ‫ا‬ ‫لتنسيق‬ ‫مع‬ ‫مديرية‬ ‫التنظيم‬ ‫المدني‬ ‫ووزارة‬ ‫البيئة‬ , ‫وأخيرا‬ ‫اقامة‬ ‫اتفاق‬ ‫حسن‬ ‫جوار‬ ‫بين‬ ‫كل‬ ‫البلديات‬ ‫التي‬ ‫يشملها‬ ‫حوض‬ ‫نهر‬ ‫الدامور‬ ‫تكون‬ ‫بمثابة‬ ‫لجنة‬ ‫متابعة‬ ‫دائمة‬ ‫بإشر‬ ‫اف‬ ‫سعادة‬ ‫القائم‬ ‫قام‬ ‫بغية‬ ‫العمل‬ ‫معا‬ ‫على‬ ‫توفير‬ ‫الحماية‬ ‫الالزمة‬ ‫لهذا‬ ‫الشريان‬ ‫المائي‬ ‫المهم‬ ‫و‬ ‫المحافظة‬ ‫على‬ ‫حقوق‬ ‫المنتفعين‬ ‫بمياهه‬ " .'
  • 14.
    2. Ambiguous sentences Isaw the man with binoculars
  • 15.
  • 16.
    Arabic Complex Morphology Analyzings such input morphologically is not an easy task to do, but it has to be done correctly to pursue to next step which is processing the input text syntactically. and you will watch it ‫وستشاهدونها‬
  • 17.
    Arabic is aPro-drop language Arabic is a pro-drop language, where the subject of a verb may be implicitly encoded in the verb morphology.
  • 18.
    Free word order EnglishSV O The boy ate the food. Arabic SV O ‫الطعام‬ ‫الولد‬ ‫أكل‬ V S O ‫الطعام‬ ‫أكل‬ ‫الولد‬
  • 19.
  • 20.
  • 21.
    Constituency grammar Phrase structuregrammar Context free grammar (CFG)
  • 22.
    Context Free Grammar •AContext-free grammar consists of a set of rules or productions, each expressing the ways the symbols of the language can be grouped together, and a lexicon of words
  • 23.
  • 24.
    1. Set of(Words) Σ ‫يذهب‬ ‫صباحا‬ ‫إلى‬ ‫المدرسة‬ ‫الطالب‬
  • 25.
    2. N aset of non-terminal symbols.
  • 26.
  • 27.
    4. R aset of rules or productions, each of the form (A → β) Contextual rules S  VP VP  V NP PP NP NP  DN NP  N PP  P NP
  • 28.
    4. R aset of rules or productions, each of the form (A → β) Contextual rules S  VP VP  V NP PP NP NP  DN NP  N PP  P NP Lexical rules N ‫صباحا‬ DN ‫الطالب‬ DN ‫المدرسة‬ V ‫يذهب‬ PP  ‫إلى‬
  • 29.
  • 30.
    Classical NLP Parsing(Pre 1990 era) Parsers have poor coverage Even quite simple sentences had many possible analyses Build CFG grammar for languages
  • 31.
    Solutions High coverage forrules We need mechanisms that allow us to find the most likely parse(s)
  • 32.
    Use annotated data Treebanksappearance Seems a lot slower and less useful than building a grammar
  • 33.
    Statistical parsing • Contextualrules S  VP S  VP NP VP  V NP PP NP VP  V NP NP  DN NP  N PP  P NP • Lexical rules N ‫صباحا‬ DN ‫الطالب‬ DN ‫المدرسة‬ V ‫يذهب‬ PP  ‫إلى‬ [0.4] [0.6] [0.3] [0.7] [0.5] [0.5] [1.0] [0.4] [0.3] [0.3] [1.0] [1.0] P is the set of probabilities associated to rules P (A → β) We need mechanisms that allow us to find the most likely parse(s)
  • 34.
    How to calculateprobability of rules • P(X |Y ) = 𝐶𝑜𝑢𝑛𝑡 𝑋 𝑌) 𝐶𝑜𝑢𝑛𝑡 (𝑌) P(VP V) = 𝐶𝑜𝑢𝑛𝑡 𝑉 𝑉𝑃) 𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃) P(VP  V NP) = 𝐶𝑜𝑢𝑛𝑡 𝑉 𝑁𝑃 𝑉𝑃) 𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃) 113/183 70/183 VP  V 113 VP  V NP 70 [0.6] [0.4]
  • 35.
    Probabilistic CFG P isthe set of probabilities associated to rules P (A → β) CFG PCFG
  • 36.
    Ambiguous sentences ‫جارهم‬ ‫وابن‬‫عمر‬ ‫لعب‬ ‫المؤدب‬
  • 37.
    How to calculatethe probability of a tree • The probability of an entire tree is the product of probabilities for these individual choices. 1.0 1.0 1.0 0.6 0.4 1.0 1.0 1.0 1.0 0.3 0.1 0.6 0.1 0.1 P(T1) = P(S → VP NP) * P(VP → V) * P(V → ‫)لعب‬ * P(NP → NP CONJ NP) * P(NP → DET + NN) * P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ ) P(NP → NN ADJP) P(NN → ‫)ابن‬ P(ADJP → NP DET + ADJ) P(NP→NN PRON) P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬ P(DET + ADJ→ ‫)المؤدب‬ P(T1) = 1.0* 1.0* 1.0* 0.1* 0.3 *1.0* 1.0* 0.1* 0.6 *0.6 *0.1*0.4*1.0*1.0 = 0000432
  • 38.
    Cont. 0.1 0.1 0.1 0.1 0.6 0.4 0.1 0.1 0.1 0.3 0.3 0.1 0.1 0.6 P(T2) = P(S→ VP NP) * P(VP → V) * P(V → ‫)لعب‬ * P(NP → NP CONJ NP) * P(NP → DET + NN) * P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ ) P(NP → NN ADJP) P(NN → ‫)ابن‬ P(ADJP → NP DET + ADJ) P(NP→NN PRON) P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬ P(DET + ADJ→ ‫)المؤدب‬ P(T1) = 1.0* 1.0* 1.0* 0.3* 0.3 *1.0* 1.0* 0.6 *0.1 *0.6 *0.1*0.4*1.0*1.0 = 0000432 0.0001296
  • 39.
    The Probabilistic ContextPCFG Free Grammar (PCFG), or the Stochastic Context- Free Grammar SCFG • Like a context-free grammar G is defined by four parameters (N, Σ, R, S); a probabilistic context-free grammar is also defined by four parameters, with a slight augmentation to each of the rules in R: • N a set of non-terminal symbols. • Σ a set of terminal symbols. • R a set of rules or productions, each of the form (A → β) • S a designated start symbol. • P is the set of probabilities associated to rules P (A → β),