1. Having an Arabic corpus: problems and
challenges
Arabic corpora : design,
construction and annotation
1. Availability
2. Forms and providers
3. Ability to be target
tailored
4. Most famous providers
(Linguistic Data
Consortium , Arabic
Treebank, Latifa Al- Sulaiti,
European Languages
Resources Association)
The use of corpora in Arabic language
research
Areas of research are :
1-Lexis
2-Lexicography
3-Syntax
4-Collocation
5-NLP systems
6-Analysis tools
7-Stylistics, and
8-Discourse analysis
2. Nafs Corpus(under construction )
• 1- Selection of texts
• 2-Putting it in the right format for processing
• 3- Cleaning of the texts
• 4- Transliteration and its problems
• 5- MADA
• 6- Nouns lists – Dictionary
• 7-Propsed algorithm based on Mitkov’s
knowledge- poor approach
• 8-Problems due to the nature of language itself