SlideShare a Scribd company logo
Arabic Syntactic parsing
By Amena Helmy
Introduction to parsing
POSTagging
‫كبيرا‬
‫صفة‬
‫كتابا‬
‫اسم‬
‫محمد‬
‫علم‬ ‫اسم‬
‫قرأ‬
‫فعل‬
3
POS tagging • Give information about the
individual words
Syntactic parsing
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
4
Syntactic parsing
• Whole sentence
• The overall structure of each sentence
(Tree)
• The way the words relate to each other
SBJ
OBJ
amod
Constituency
Vs Dependency
Constituency Parsing
Words are organized into constituents
Constituents are groups of words that can act
as single units
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
NP NP
S
VP
ConstituencyTest
• Substitution
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
•
‫عن‬ ‫أعلم‬ ‫ال‬
‫ه‬
‫شيئا‬
• Conjunction
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
•
‫عن‬ ‫شيئا‬ ‫الأعلم‬
‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬
‫وصديقه‬
Head of constituents (phrases)
‫هاما‬ ‫أمرا‬ ‫يكون‬ ‫قد‬
VP
‫يكون‬
‫الذي‬ ‫الرجل‬
‫اشترى‬
‫البيت‬
NP
‫الرجل‬
‫النهر‬ ‫ضفتي‬ ‫على‬
PP
‫على‬
For constituents, we usually name them as phrases based on the
word that heads the constituent (The most important word)
Dependency structure
All nodes are words.
Relations between words are
shown through directed arc that
goes from the head to dependent.
The type of relation is manifested
through arc labels
SBJ
OBJ
amod
‫كبيرا‬
JJ
‫كتابا‬
NN
‫محمد‬
NNP
‫قرأ‬
VBD
Parsing
importance
Parsing importance
Parsing is the task of uncovering the
syntactic structure of language and is often
viewed as an important prerequisite for
building systems capable of understanding
language
Syntactic structure is necessary as a first
step towards some NLP tasks
English Chinese
Why Parsing is
challenging
1. Real sentences are long
•
‫و‬
‫خلص‬
‫الى‬
‫ان‬
‫جدول‬
‫اعمال‬
‫اللقاء‬
‫الحالي‬
‫يتضمن‬
:
‫العمل‬
‫على‬
‫تنظيم‬
‫الري‬
‫في‬
‫شكل‬
‫مدروس‬
‫يهدف‬
‫الي‬
‫االفادة‬
‫الى‬
‫اقصى‬
‫حد‬
‫من‬
‫الثروة‬
‫المائية‬
‫التي‬
‫يوفر‬
‫ها‬
‫نهر‬
‫الدامور‬
‫لري‬
‫كل‬
‫االراضي‬
‫الزراعية‬
‫التي‬
‫تنتفع‬
‫بالري‬
‫من‬
‫مي‬
‫اهه‬
,
‫علما‬
‫ان‬
‫الطول‬
‫االجمالي‬
‫للمجاري‬
‫الدائمة‬
‫ل‬
‫ل‬
‫نهر‬
‫هو‬
‫حوالى‬
76
‫كيلومترا‬
‫و‬
‫متوسط‬
‫صبيبها‬
‫نحو‬
4.71
‫امتار‬
‫مكعبة‬
‫بالثانية‬
‫و‬
‫أن‬
‫عدد‬
‫ايام‬
‫الهطول‬
‫التقريبي‬
‫هو‬
70
‫يوما‬
‫في‬
‫السنة‬
‫و‬
‫بالتالي‬
‫فإن‬
‫عملية‬
‫حسابية‬
‫بسيطة‬
‫تظهر‬
‫ان‬
‫متوسط‬
‫كمية‬
‫المي‬
‫اه‬
‫التي‬
‫يوفرها‬
‫النهر‬
‫هي‬
28
‫مليون‬
‫متر‬
‫مكعب‬
‫من‬
‫المياه‬
‫سنويا‬
,
‫االمر‬
‫الذي‬
‫يظهر‬
‫اهمية‬
‫هذا‬
‫النهر‬
‫و‬
‫مقدار‬
‫الطاقة‬
‫المائية‬
‫التي‬
‫يمكن‬
‫االفادة‬
‫منها‬
,
‫اضافة‬
‫الى‬
‫حل‬
‫مشكلة‬
‫المؤسسات‬
‫السياحية‬
‫التي‬
‫انشئت‬
‫من‬
‫دون‬
‫رخص‬
‫قانونية‬
‫و‬
‫إعداد‬
‫لوائح‬
‫بالتعديات‬
‫و‬
‫العمل‬
‫على‬
‫إزالتها‬
‫و‬
‫منع‬
‫استخدام‬
‫الكهرباء‬
‫بهدف‬
‫الصيد‬
‫في‬
‫مياه‬
‫النهر‬
‫محافظة‬
‫على‬
‫االسماك‬
‫ال‬
‫تي‬
‫تتوالد‬
‫و‬
‫تتكاثر‬
‫فيه‬
‫و‬
‫منع‬
‫اقامة‬
‫شبكات‬
‫صرف‬
‫صحي‬
‫تلوث‬
‫مجرى‬
‫النهر‬
,
‫العمل‬
‫على‬
‫معالجة‬
‫مياه‬
‫المصانع‬
‫على‬
‫اختالفها‬
‫قبل‬
‫ان‬
‫تصب‬
‫في‬
‫النهر‬
،
‫و‬
‫توعية‬
‫المزارعين‬
‫على‬
‫موضوع‬
‫المبيدات‬
‫الزراعية‬
‫و‬
‫معالجة‬
‫مشكلة‬
‫البناء‬
‫العشوائي‬
‫ل‬
‫ا‬
‫لحد‬
‫منه‬
‫على‬
‫ضفتي‬
‫النهر‬
‫و‬
‫وضع‬
‫تخطيط‬
‫عمراني‬
‫جديد‬
‫للحوض‬
‫ب‬
‫ا‬
‫لتنسيق‬
‫مع‬
‫مديرية‬
‫التنظيم‬
‫المدني‬
‫ووزارة‬
‫البيئة‬
,
‫وأخيرا‬
‫اقامة‬
‫اتفاق‬
‫حسن‬
‫جوار‬
‫بين‬
‫كل‬
‫البلديات‬
‫التي‬
‫يشملها‬
‫حوض‬
‫نهر‬
‫الدامور‬
‫تكون‬
‫بمثابة‬
‫لجنة‬
‫متابعة‬
‫دائمة‬
‫بإشر‬
‫اف‬
‫سعادة‬
‫القائم‬
‫قام‬
‫بغية‬
‫العمل‬
‫معا‬
‫على‬
‫توفير‬
‫الحماية‬
‫الالزمة‬
‫لهذا‬
‫الشريان‬
‫المائي‬
‫المهم‬
‫و‬
‫المحافظة‬
‫على‬
‫حقوق‬
‫المنتفعين‬
‫بمياهه‬
" .'
2. Ambiguous sentences
I saw the man with
binoculars
Arabic syntactic
challenges
Arabic Complex Morphology
Analyzing s such input
morphologically is not an
easy task to do, but it has
to be done correctly to
pursue to next step which
is processing the input text
syntactically.
and you will watch it
‫وستشاهدونها‬
Arabic is a Pro-drop language
Arabic is a pro-drop
language, where the subject
of a verb may be implicitly
encoded in the verb
morphology.
Free word order
English SV O The boy ate the
food.
Arabic SV O ‫الطعام‬ ‫الولد‬ ‫أكل‬
V S O ‫الطعام‬ ‫أكل‬ ‫الولد‬
Parsing
process
The parsing process
P
A
R
S
E
R
Grammar
sentences
Constituency Dependency
Constituency grammar
Phrase structure grammar
Context free grammar (CFG)
Context Free Grammar
•A Context-free grammar consists of a set of rules or
productions, each expressing the ways the symbols of the
language can be grouped together, and a lexicon of words
CFG Components
1. Set of (Words) Σ
‫يذهب‬
‫صباحا‬
‫إلى‬
‫المدرسة‬
‫الطالب‬
2. N a set of non-terminal symbols.
3- Start symbol S
4. R a set of rules or productions, each of the
form (A → β)
Contextual rules
S  VP
VP  V NP PP NP
NP  DN
NP  N
PP  P NP
4. R a set of rules or productions, each of the
form (A → β)
Contextual rules
S  VP
VP  V NP PP NP
NP  DN
NP  N
PP  P NP
Lexical rules
N ‫صباحا‬
DN ‫الطالب‬
DN ‫المدرسة‬
V ‫يذهب‬
PP  ‫إلى‬
History of CFG Parsing
Classical NLP Parsing (Pre 1990 era)
Parsers have poor coverage
Even quite simple sentences
had many possible analyses
Build CFG grammar
for languages
Solutions
High coverage for rules
We need mechanisms that allow us
to find the most likely parse(s)
Use annotated data
Treebanks appearance
Seems a lot slower and
less useful than building a
grammar
Statistical parsing
• Contextual rules
S  VP
S  VP NP
VP  V NP PP NP
VP  V NP
NP  DN
NP  N
PP  P NP
• Lexical rules
N ‫صباحا‬
DN ‫الطالب‬
DN ‫المدرسة‬
V ‫يذهب‬
PP  ‫إلى‬
[0.4]
[0.6]
[0.3]
[0.7]
[0.5]
[0.5]
[1.0]
[0.4]
[0.3]
[0.3]
[1.0]
[1.0]
P is the set of probabilities associated to rules P (A → β)
We need mechanisms that allow us to find the
most likely parse(s)
How to calculate probability of rules
• P(X |Y ) =
𝐶𝑜𝑢𝑛𝑡 𝑋 𝑌)
𝐶𝑜𝑢𝑛𝑡 (𝑌)
P(VP V) =
𝐶𝑜𝑢𝑛𝑡 𝑉 𝑉𝑃)
𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃)
P(VP  V NP) =
𝐶𝑜𝑢𝑛𝑡 𝑉 𝑁𝑃 𝑉𝑃)
𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃)
113/183
70/183
VP  V 113
VP  V NP 70
[0.6]
[0.4]
Probabilistic CFG
P is the set of probabilities associated to rules P (A → β)
CFG PCFG
Ambiguous sentences
‫جارهم‬ ‫وابن‬ ‫عمر‬ ‫لعب‬
‫المؤدب‬
How to calculate the probability of a tree
• The probability of an entire tree is the product of probabilities for these individual choices.
1.0
1.0
1.0
0.6
0.4
1.0
1.0
1.0
1.0
0.3
0.1
0.6
0.1
0.1
P(T1) = P(S → VP NP) * P(VP → V) * P(V →
‫)لعب‬ *
P(NP → NP CONJ NP) * P(NP → DET + NN) *
P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ )
P(NP → NN ADJP) P(NN → ‫)ابن‬
P(ADJP → NP DET + ADJ) P(NP→NN PRON)
P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬
P(DET + ADJ→ ‫)المؤدب‬
P(T1) = 1.0* 1.0* 1.0*
0.1* 0.3 *1.0* 1.0* 0.1* 0.6 *0.6 *0.1*0.4*1.0*1.0 =
0000432
Cont.
0.1
0.1
0.1
0.1
0.6
0.4
0.1
0.1
0.1
0.3
0.3
0.1
0.1
0.6
P(T2) = P(S → VP NP) * P(VP → V) * P(V →
‫)لعب‬ *
P(NP → NP CONJ NP) * P(NP → DET + NN)
*
P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ )
P(NP → NN ADJP) P(NN → ‫)ابن‬
P(ADJP → NP DET + ADJ) P(NP→NN
PRON)
P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬
P(DET + ADJ→ ‫)المؤدب‬
P(T1) = 1.0* 1.0* 1.0*
0.3* 0.3 *1.0* 1.0* 0.6 *0.1 *0.6
*0.1*0.4*1.0*1.0 =
0000432 0.0001296
The Probabilistic Context PCFG Free
Grammar (PCFG), or the Stochastic Context-
Free Grammar SCFG
• Like a context-free grammar G is defined by four parameters (N, Σ, R, S); a
probabilistic context-free grammar is also defined by four parameters, with a
slight augmentation to each of the rules in R:
• N a set of non-terminal symbols.
• Σ a set of terminal symbols.
• R a set of rules or productions, each of the form (A → β)
• S a designated start symbol.
• P is the set of probabilities associated to rules P (A → β),

More Related Content

Similar to Arabic syntactic parsing

Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Yuki Arase
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
Ali Bencherif
 
stats-parsing.ppt
stats-parsing.pptstats-parsing.ppt
stats-parsing.ppt
VAISHNAVIVANKUDOTH
 
stats-parsing.ppt
stats-parsing.pptstats-parsing.ppt
stats-parsing.ppt
naima768128
 
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Seonghyun Kim
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
siddhantroy13
 
Lecture 2009-09-22
Lecture 2009-09-22Lecture 2009-09-22
Lecture 2009-09-22
hirafoundation school
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
KhenAguinillo
 
NLP
NLPNLP
NLP
NLPNLP
Types of parsers
Types of parsersTypes of parsers
Types of parsers
Sabiha M
 
Ammara
AmmaraAmmara
Ammara
samra iqbal
 
Phrase structure
Phrase structurePhrase structure
Phrase structure
Fairry Shining
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Syntax
SyntaxSyntax
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
bdiot
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
Gagan Gowda
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 

Similar to Arabic syntactic parsing (20)

Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
 
stats-parsing.ppt
stats-parsing.pptstats-parsing.ppt
stats-parsing.ppt
 
stats-parsing.ppt
stats-parsing.pptstats-parsing.ppt
stats-parsing.ppt
 
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
Korean-optimized Word Representations for Out of Vocabulary Problems caused b...
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
Lecture 2009-09-22
Lecture 2009-09-22Lecture 2009-09-22
Lecture 2009-09-22
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
Ammara
AmmaraAmmara
Ammara
 
Phrase structure
Phrase structurePhrase structure
Phrase structure
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Syntax
SyntaxSyntax
Syntax
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 

Recently uploaded

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 

Recently uploaded (20)

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 

Arabic syntactic parsing

  • 4. Syntactic parsing ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD 4 Syntactic parsing • Whole sentence • The overall structure of each sentence (Tree) • The way the words relate to each other SBJ OBJ amod
  • 6. Constituency Parsing Words are organized into constituents Constituents are groups of words that can act as single units ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD NP NP S VP
  • 7. ConstituencyTest • Substitution • ‫عن‬ ‫شيئا‬ ‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ • ‫عن‬ ‫أعلم‬ ‫ال‬ ‫ه‬ ‫شيئا‬ • Conjunction • ‫عن‬ ‫شيئا‬ ‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ • ‫عن‬ ‫شيئا‬ ‫الأعلم‬ ‫السوداء‬ ‫النظارة‬ ‫ذو‬ ‫األطوار‬ ‫غريب‬ ‫الرجل‬ ‫وصديقه‬
  • 8. Head of constituents (phrases) ‫هاما‬ ‫أمرا‬ ‫يكون‬ ‫قد‬ VP ‫يكون‬ ‫الذي‬ ‫الرجل‬ ‫اشترى‬ ‫البيت‬ NP ‫الرجل‬ ‫النهر‬ ‫ضفتي‬ ‫على‬ PP ‫على‬ For constituents, we usually name them as phrases based on the word that heads the constituent (The most important word)
  • 9. Dependency structure All nodes are words. Relations between words are shown through directed arc that goes from the head to dependent. The type of relation is manifested through arc labels SBJ OBJ amod ‫كبيرا‬ JJ ‫كتابا‬ NN ‫محمد‬ NNP ‫قرأ‬ VBD
  • 11. Parsing importance Parsing is the task of uncovering the syntactic structure of language and is often viewed as an important prerequisite for building systems capable of understanding language Syntactic structure is necessary as a first step towards some NLP tasks English Chinese
  • 13. 1. Real sentences are long • ‫و‬ ‫خلص‬ ‫الى‬ ‫ان‬ ‫جدول‬ ‫اعمال‬ ‫اللقاء‬ ‫الحالي‬ ‫يتضمن‬ : ‫العمل‬ ‫على‬ ‫تنظيم‬ ‫الري‬ ‫في‬ ‫شكل‬ ‫مدروس‬ ‫يهدف‬ ‫الي‬ ‫االفادة‬ ‫الى‬ ‫اقصى‬ ‫حد‬ ‫من‬ ‫الثروة‬ ‫المائية‬ ‫التي‬ ‫يوفر‬ ‫ها‬ ‫نهر‬ ‫الدامور‬ ‫لري‬ ‫كل‬ ‫االراضي‬ ‫الزراعية‬ ‫التي‬ ‫تنتفع‬ ‫بالري‬ ‫من‬ ‫مي‬ ‫اهه‬ , ‫علما‬ ‫ان‬ ‫الطول‬ ‫االجمالي‬ ‫للمجاري‬ ‫الدائمة‬ ‫ل‬ ‫ل‬ ‫نهر‬ ‫هو‬ ‫حوالى‬ 76 ‫كيلومترا‬ ‫و‬ ‫متوسط‬ ‫صبيبها‬ ‫نحو‬ 4.71 ‫امتار‬ ‫مكعبة‬ ‫بالثانية‬ ‫و‬ ‫أن‬ ‫عدد‬ ‫ايام‬ ‫الهطول‬ ‫التقريبي‬ ‫هو‬ 70 ‫يوما‬ ‫في‬ ‫السنة‬ ‫و‬ ‫بالتالي‬ ‫فإن‬ ‫عملية‬ ‫حسابية‬ ‫بسيطة‬ ‫تظهر‬ ‫ان‬ ‫متوسط‬ ‫كمية‬ ‫المي‬ ‫اه‬ ‫التي‬ ‫يوفرها‬ ‫النهر‬ ‫هي‬ 28 ‫مليون‬ ‫متر‬ ‫مكعب‬ ‫من‬ ‫المياه‬ ‫سنويا‬ , ‫االمر‬ ‫الذي‬ ‫يظهر‬ ‫اهمية‬ ‫هذا‬ ‫النهر‬ ‫و‬ ‫مقدار‬ ‫الطاقة‬ ‫المائية‬ ‫التي‬ ‫يمكن‬ ‫االفادة‬ ‫منها‬ , ‫اضافة‬ ‫الى‬ ‫حل‬ ‫مشكلة‬ ‫المؤسسات‬ ‫السياحية‬ ‫التي‬ ‫انشئت‬ ‫من‬ ‫دون‬ ‫رخص‬ ‫قانونية‬ ‫و‬ ‫إعداد‬ ‫لوائح‬ ‫بالتعديات‬ ‫و‬ ‫العمل‬ ‫على‬ ‫إزالتها‬ ‫و‬ ‫منع‬ ‫استخدام‬ ‫الكهرباء‬ ‫بهدف‬ ‫الصيد‬ ‫في‬ ‫مياه‬ ‫النهر‬ ‫محافظة‬ ‫على‬ ‫االسماك‬ ‫ال‬ ‫تي‬ ‫تتوالد‬ ‫و‬ ‫تتكاثر‬ ‫فيه‬ ‫و‬ ‫منع‬ ‫اقامة‬ ‫شبكات‬ ‫صرف‬ ‫صحي‬ ‫تلوث‬ ‫مجرى‬ ‫النهر‬ , ‫العمل‬ ‫على‬ ‫معالجة‬ ‫مياه‬ ‫المصانع‬ ‫على‬ ‫اختالفها‬ ‫قبل‬ ‫ان‬ ‫تصب‬ ‫في‬ ‫النهر‬ ، ‫و‬ ‫توعية‬ ‫المزارعين‬ ‫على‬ ‫موضوع‬ ‫المبيدات‬ ‫الزراعية‬ ‫و‬ ‫معالجة‬ ‫مشكلة‬ ‫البناء‬ ‫العشوائي‬ ‫ل‬ ‫ا‬ ‫لحد‬ ‫منه‬ ‫على‬ ‫ضفتي‬ ‫النهر‬ ‫و‬ ‫وضع‬ ‫تخطيط‬ ‫عمراني‬ ‫جديد‬ ‫للحوض‬ ‫ب‬ ‫ا‬ ‫لتنسيق‬ ‫مع‬ ‫مديرية‬ ‫التنظيم‬ ‫المدني‬ ‫ووزارة‬ ‫البيئة‬ , ‫وأخيرا‬ ‫اقامة‬ ‫اتفاق‬ ‫حسن‬ ‫جوار‬ ‫بين‬ ‫كل‬ ‫البلديات‬ ‫التي‬ ‫يشملها‬ ‫حوض‬ ‫نهر‬ ‫الدامور‬ ‫تكون‬ ‫بمثابة‬ ‫لجنة‬ ‫متابعة‬ ‫دائمة‬ ‫بإشر‬ ‫اف‬ ‫سعادة‬ ‫القائم‬ ‫قام‬ ‫بغية‬ ‫العمل‬ ‫معا‬ ‫على‬ ‫توفير‬ ‫الحماية‬ ‫الالزمة‬ ‫لهذا‬ ‫الشريان‬ ‫المائي‬ ‫المهم‬ ‫و‬ ‫المحافظة‬ ‫على‬ ‫حقوق‬ ‫المنتفعين‬ ‫بمياهه‬ " .'
  • 14. 2. Ambiguous sentences I saw the man with binoculars
  • 16. Arabic Complex Morphology Analyzing s such input morphologically is not an easy task to do, but it has to be done correctly to pursue to next step which is processing the input text syntactically. and you will watch it ‫وستشاهدونها‬
  • 17. Arabic is a Pro-drop language Arabic is a pro-drop language, where the subject of a verb may be implicitly encoded in the verb morphology.
  • 18. Free word order English SV O The boy ate the food. Arabic SV O ‫الطعام‬ ‫الولد‬ ‫أكل‬ V S O ‫الطعام‬ ‫أكل‬ ‫الولد‬
  • 21. Constituency grammar Phrase structure grammar Context free grammar (CFG)
  • 22. Context Free Grammar •A Context-free grammar consists of a set of rules or productions, each expressing the ways the symbols of the language can be grouped together, and a lexicon of words
  • 24. 1. Set of (Words) Σ ‫يذهب‬ ‫صباحا‬ ‫إلى‬ ‫المدرسة‬ ‫الطالب‬
  • 25. 2. N a set of non-terminal symbols.
  • 27. 4. R a set of rules or productions, each of the form (A → β) Contextual rules S  VP VP  V NP PP NP NP  DN NP  N PP  P NP
  • 28. 4. R a set of rules or productions, each of the form (A → β) Contextual rules S  VP VP  V NP PP NP NP  DN NP  N PP  P NP Lexical rules N ‫صباحا‬ DN ‫الطالب‬ DN ‫المدرسة‬ V ‫يذهب‬ PP  ‫إلى‬
  • 29. History of CFG Parsing
  • 30. Classical NLP Parsing (Pre 1990 era) Parsers have poor coverage Even quite simple sentences had many possible analyses Build CFG grammar for languages
  • 31. Solutions High coverage for rules We need mechanisms that allow us to find the most likely parse(s)
  • 32. Use annotated data Treebanks appearance Seems a lot slower and less useful than building a grammar
  • 33. Statistical parsing • Contextual rules S  VP S  VP NP VP  V NP PP NP VP  V NP NP  DN NP  N PP  P NP • Lexical rules N ‫صباحا‬ DN ‫الطالب‬ DN ‫المدرسة‬ V ‫يذهب‬ PP  ‫إلى‬ [0.4] [0.6] [0.3] [0.7] [0.5] [0.5] [1.0] [0.4] [0.3] [0.3] [1.0] [1.0] P is the set of probabilities associated to rules P (A → β) We need mechanisms that allow us to find the most likely parse(s)
  • 34. How to calculate probability of rules • P(X |Y ) = 𝐶𝑜𝑢𝑛𝑡 𝑋 𝑌) 𝐶𝑜𝑢𝑛𝑡 (𝑌) P(VP V) = 𝐶𝑜𝑢𝑛𝑡 𝑉 𝑉𝑃) 𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃) P(VP  V NP) = 𝐶𝑜𝑢𝑛𝑡 𝑉 𝑁𝑃 𝑉𝑃) 𝐶𝑜𝑢𝑛𝑡 (𝑉𝑃) 113/183 70/183 VP  V 113 VP  V NP 70 [0.6] [0.4]
  • 35. Probabilistic CFG P is the set of probabilities associated to rules P (A → β) CFG PCFG
  • 36. Ambiguous sentences ‫جارهم‬ ‫وابن‬ ‫عمر‬ ‫لعب‬ ‫المؤدب‬
  • 37. How to calculate the probability of a tree • The probability of an entire tree is the product of probabilities for these individual choices. 1.0 1.0 1.0 0.6 0.4 1.0 1.0 1.0 1.0 0.3 0.1 0.6 0.1 0.1 P(T1) = P(S → VP NP) * P(VP → V) * P(V → ‫)لعب‬ * P(NP → NP CONJ NP) * P(NP → DET + NN) * P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ ) P(NP → NN ADJP) P(NN → ‫)ابن‬ P(ADJP → NP DET + ADJ) P(NP→NN PRON) P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬ P(DET + ADJ→ ‫)المؤدب‬ P(T1) = 1.0* 1.0* 1.0* 0.1* 0.3 *1.0* 1.0* 0.1* 0.6 *0.6 *0.1*0.4*1.0*1.0 = 0000432
  • 38. Cont. 0.1 0.1 0.1 0.1 0.6 0.4 0.1 0.1 0.1 0.3 0.3 0.1 0.1 0.6 P(T2) = P(S → VP NP) * P(VP → V) * P(V → ‫)لعب‬ * P(NP → NP CONJ NP) * P(NP → DET + NN) * P(DET + NN → ‫)الطفل‬ P(CONJ → ‫و‬ ) P(NP → NN ADJP) P(NN → ‫)ابن‬ P(ADJP → NP DET + ADJ) P(NP→NN PRON) P(NN→ ‫)جار‬ P(PRON→ ‫)هم‬ P(DET + ADJ→ ‫)المؤدب‬ P(T1) = 1.0* 1.0* 1.0* 0.3* 0.3 *1.0* 1.0* 0.6 *0.1 *0.6 *0.1*0.4*1.0*1.0 = 0000432 0.0001296
  • 39. The Probabilistic Context PCFG Free Grammar (PCFG), or the Stochastic Context- Free Grammar SCFG • Like a context-free grammar G is defined by four parameters (N, Σ, R, S); a probabilistic context-free grammar is also defined by four parameters, with a slight augmentation to each of the rules in R: • N a set of non-terminal symbols. • Σ a set of terminal symbols. • R a set of rules or productions, each of the form (A → β) • S a designated start symbol. • P is the set of probabilities associated to rules P (A → β),