د.سلطان بن ناصر بن عبد الله المجيول، دكتوراه (لسانيات المدونات الحاسوبية وعلم اللغة التطبيقي، جامعة إكسيتر، بريطانيا، 1434هـ)، وماجستير (اللغة والنحو، تخصص: علم اللغة الاجتماعي والمصطلحات، جامعة الملك سعود، 1427هـ)، ودبلوم عالي (علم اللغة التطبيقي، جامعة الملك سعود، 1426هـ) وبكالوريوس (اللغة العربية، جامعة الملك سعود، 1424هـ).
P03- MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization iwan_rg
By:
Ossama Obeid, Houda Bouamor, Wajdi Zaghouani, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab and Kemal Oflazer
Abstract
In this paper, we introduce MANDIAC, a web-based annotation system designed for rapid manual diacritization of Standard Arabic text. To expedite the annotation process, the system provides annotators with a choice of automatically generated diacritization possibilities for each word. Our framework provides intuitive interfaces for annotating text and managing the diacritization annotation process. In this paper we describe the annotation and the administration interfaces as well as the back-end engine. Finally, we demonstrate that our system doubles the annotation speed compared to using a regular text editor.
Sketch Engine is a web-based tool for analyzing corpora. It allows users to generate word sketches, view concordances, find similar words using the thesaurus, and compare the behavior of words. Key functions include the concordancer, word lists, word sketches, thesaurus, and word sketch difference. Users can analyze pre-loaded corpora or upload their own texts for tokenization, lemmatization, and POS tagging to build custom corpora.
P04- Toward an Arabic Punctuated Corpus: Annotation Guidelines and Evaluation iwan_rg
This document discusses the development of guidelines and annotation for an Arabic punctuated corpus as part of the QALB project. The goals were to advance Arabic punctuation correction research and create a large annotated corpus. Guidelines focused on punctuation error correction and addition based on standard Arabic rules. Annotators used a web interface to annotate texts consistently according to the guidelines. Evaluation showed good inter-annotator agreement, though agreement was higher for machine translation texts which contained fewer optional commas.
P02- Towards a New Arabic Corpus of Dyslexic Textsiwan_rg
By:
Maha Alamri and William John Teahan
Abstract
This paper presents a detailed account of the preliminary work for the creation of a new Arabic corpus of dyslexic text. The analysis of errors found in the corpus revealed that there are four types of spelling errors made as a result of dyslexia in addition to four common spelling errors. The subsequent aim was to develop a spellchecker capable of automatically correcting the spelling mistakes of dyslexic writers in Arabic texts using statistical techniques. The purpose was to provide a tool to assist Arabic dyslexic writers. Some initial success was achieved in the automatic correction of dyslexic errors in Arabic text.
P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis iwan_rg
By:
Muhammad Abdul-Mageed, Hassan Alhuzali, Dua'a Abu-Elhij'a and Mona Diab
Abstract
Although there has been a surge of research on sentiment analysis, less work has been done on the related task of emotion detection. Especially for the Arabic language, there is no literature that we know of for the computational treatment of emotion. This situation is due partially to lack of labelled data, a bottleneck that we seek to ease. In this work, we report efforts to acquire and annotate a multi-dialect dataset for Arabic emotion analysis.
P01- Toward a rich Arabic Speech Parallel Corpus for Algerian sub-Dialects iwan_rg
By:
Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi, Abdallah Lakhdari and Aicha Chorana
Abstract
Speech datasets and corpora are crucial for both developing and evaluating accurate Natural Language Processing systems. While Modern Standard Arabic has received more attention, dialects are drastically underestimated, even they are the most used in our daily life and the social media, recently. In this paper, we present the methodology of building an Arabic Speech Corpus for Algerian dialects, and the preliminary version of that dataset of dialectal Arabic speeches uttered by Algerian native speakers selected from different Algeria’s departments. In fact, by means of a direct recording way, we have taken into account numerous aspects that foster the richness of the corpus and that provide a representation of phonetic, prosodic and orthographic varieties of Algerian dialects. Among these considerations, we have designed a rich speech topics and content. The annotations provided are some useful information related to the speakers, time-aligned orthographic word transcription. Many potential uses can be considered such as speaker/dialect identification and computational linguistic for Algerian sub-dialects. In its preliminary version, our corpus encompasses 17 sub-dialects with 109 speakers and more than 6 K utterances.
Keynote - Computational Processing of Arabic Dialects: Challenges, Advances a...iwan_rg
By:
Nizar Habash
Abstract
The Arabic language consists of a number of variants among which Modern Standard Arabic (MSA) has a special status as the formal, mostly written, standard of the media, culture and education across the Arab World. The other variants are informal, mostly spoken, dialects that are the languages of communication of daily life. Most of the natural language processing resources and research in Arabic have focused on MSA. However, recently, more and more research is targeting Arabic dialects. In this talk, we present the main challenges of processing Arabic dialects, and discuss common solution paradigms, current advances, and future directions.
رغم أن وجود المدونات اللغوية والأدوات الحاسوبية التي تسهل استخدامها في الدراسة اللغوية ليس أمرا جديدا، إلا أن الجهود العربية الخالصة التي تمت بخصوص بناء المدونات وأدوات معالجتها لازالت في بداياتها. والهدف من هذه المحاضرة هو تقديم لمحة عامة عن هذا الموضوع، ويمكن تلخيصها في ثلاثة محاور رئيسية. المحور الأول يقدم استعراضا موجزا لمعايير تصميم المدونات بحيث تكون متوازنة وممثلة للغرض الذي أنشئت من أجله، بالإضافة إلى المعلومات الأساسية التي يجب أن تتوفر بصورة واضحة عن نصوصها. أما المحور الثاني فيتعلق بتصميم وبناء المدونة اللغوية العربية لمدينة الملك عبدالعزيز للعلوم والتقنية (المدونة العربية)، والسمات التي تميزها عن غيرها من المدونات العربية الموجودة حتى الآن، مع استعراض سريع لأدوات الموقع المتوفرة حاليا، وتلك التي ستتوفر في الموقع الجديد بحول الله. أما المحور الثالث والأخير فيتعلق ببعض البرامج والأدوات التي طورت بالكامل في مدينة الملك عبدالعزيز للعلوم والتقنية أوتم تسهيل عملية استخدامها لغير المختصين لتكون منظومة كاملة قدر الإمكان لمعالجة المدونات اللغوية العربية حسب حاجة المستخدم مع التركيز بشكل رئيس على أهم هذه البرامج وهو نظام " غواص".
الكامل في اتفاق الصحابة والأئمة علي الخِمار وتحريم إظهار المرأة لشئ من جسدها ...MaymonSalim
سلسلة الكامل / كتاب رقم ( 166 ) / ( الكامل في اتفاق الصحابة والأئمة علي الخِمار وتحريم إظهار المرأة لشئ من جسدها سوي الوجه والكفين علي الأكثر مع ذِكر ( 100 ) صحابي وإمام منهم وكشف جهالة الحدثاء الأغرار ) ، لمؤلفه د/ عامر الحسيني
يقوم كلا منا بالإستعداد وتجهيز و تسخير كل ما يملك لخوض إمتحانات حياته الهامة
قد سخّر الله كل ما قد نحتاجه لتجاوز إمتحان الحياة الدنيا و العبور به إلى الجنة
دعنا نراجع هذا معا
د.سلطان بن ناصر بن عبد الله المجيول، دكتوراه (لسانيات المدونات الحاسوبية وعلم اللغة التطبيقي، جامعة إكسيتر، بريطانيا، 1434هـ)، وماجستير (اللغة والنحو، تخصص: علم اللغة الاجتماعي والمصطلحات، جامعة الملك سعود، 1427هـ)، ودبلوم عالي (علم اللغة التطبيقي، جامعة الملك سعود، 1426هـ) وبكالوريوس (اللغة العربية، جامعة الملك سعود، 1424هـ).
P03- MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization iwan_rg
By:
Ossama Obeid, Houda Bouamor, Wajdi Zaghouani, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab and Kemal Oflazer
Abstract
In this paper, we introduce MANDIAC, a web-based annotation system designed for rapid manual diacritization of Standard Arabic text. To expedite the annotation process, the system provides annotators with a choice of automatically generated diacritization possibilities for each word. Our framework provides intuitive interfaces for annotating text and managing the diacritization annotation process. In this paper we describe the annotation and the administration interfaces as well as the back-end engine. Finally, we demonstrate that our system doubles the annotation speed compared to using a regular text editor.
Sketch Engine is a web-based tool for analyzing corpora. It allows users to generate word sketches, view concordances, find similar words using the thesaurus, and compare the behavior of words. Key functions include the concordancer, word lists, word sketches, thesaurus, and word sketch difference. Users can analyze pre-loaded corpora or upload their own texts for tokenization, lemmatization, and POS tagging to build custom corpora.
P04- Toward an Arabic Punctuated Corpus: Annotation Guidelines and Evaluation iwan_rg
This document discusses the development of guidelines and annotation for an Arabic punctuated corpus as part of the QALB project. The goals were to advance Arabic punctuation correction research and create a large annotated corpus. Guidelines focused on punctuation error correction and addition based on standard Arabic rules. Annotators used a web interface to annotate texts consistently according to the guidelines. Evaluation showed good inter-annotator agreement, though agreement was higher for machine translation texts which contained fewer optional commas.
P02- Towards a New Arabic Corpus of Dyslexic Textsiwan_rg
By:
Maha Alamri and William John Teahan
Abstract
This paper presents a detailed account of the preliminary work for the creation of a new Arabic corpus of dyslexic text. The analysis of errors found in the corpus revealed that there are four types of spelling errors made as a result of dyslexia in addition to four common spelling errors. The subsequent aim was to develop a spellchecker capable of automatically correcting the spelling mistakes of dyslexic writers in Arabic texts using statistical techniques. The purpose was to provide a tool to assist Arabic dyslexic writers. Some initial success was achieved in the automatic correction of dyslexic errors in Arabic text.
P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis iwan_rg
By:
Muhammad Abdul-Mageed, Hassan Alhuzali, Dua'a Abu-Elhij'a and Mona Diab
Abstract
Although there has been a surge of research on sentiment analysis, less work has been done on the related task of emotion detection. Especially for the Arabic language, there is no literature that we know of for the computational treatment of emotion. This situation is due partially to lack of labelled data, a bottleneck that we seek to ease. In this work, we report efforts to acquire and annotate a multi-dialect dataset for Arabic emotion analysis.
P01- Toward a rich Arabic Speech Parallel Corpus for Algerian sub-Dialects iwan_rg
By:
Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi, Abdallah Lakhdari and Aicha Chorana
Abstract
Speech datasets and corpora are crucial for both developing and evaluating accurate Natural Language Processing systems. While Modern Standard Arabic has received more attention, dialects are drastically underestimated, even they are the most used in our daily life and the social media, recently. In this paper, we present the methodology of building an Arabic Speech Corpus for Algerian dialects, and the preliminary version of that dataset of dialectal Arabic speeches uttered by Algerian native speakers selected from different Algeria’s departments. In fact, by means of a direct recording way, we have taken into account numerous aspects that foster the richness of the corpus and that provide a representation of phonetic, prosodic and orthographic varieties of Algerian dialects. Among these considerations, we have designed a rich speech topics and content. The annotations provided are some useful information related to the speakers, time-aligned orthographic word transcription. Many potential uses can be considered such as speaker/dialect identification and computational linguistic for Algerian sub-dialects. In its preliminary version, our corpus encompasses 17 sub-dialects with 109 speakers and more than 6 K utterances.
Keynote - Computational Processing of Arabic Dialects: Challenges, Advances a...iwan_rg
By:
Nizar Habash
Abstract
The Arabic language consists of a number of variants among which Modern Standard Arabic (MSA) has a special status as the formal, mostly written, standard of the media, culture and education across the Arab World. The other variants are informal, mostly spoken, dialects that are the languages of communication of daily life. Most of the natural language processing resources and research in Arabic have focused on MSA. However, recently, more and more research is targeting Arabic dialects. In this talk, we present the main challenges of processing Arabic dialects, and discuss common solution paradigms, current advances, and future directions.
رغم أن وجود المدونات اللغوية والأدوات الحاسوبية التي تسهل استخدامها في الدراسة اللغوية ليس أمرا جديدا، إلا أن الجهود العربية الخالصة التي تمت بخصوص بناء المدونات وأدوات معالجتها لازالت في بداياتها. والهدف من هذه المحاضرة هو تقديم لمحة عامة عن هذا الموضوع، ويمكن تلخيصها في ثلاثة محاور رئيسية. المحور الأول يقدم استعراضا موجزا لمعايير تصميم المدونات بحيث تكون متوازنة وممثلة للغرض الذي أنشئت من أجله، بالإضافة إلى المعلومات الأساسية التي يجب أن تتوفر بصورة واضحة عن نصوصها. أما المحور الثاني فيتعلق بتصميم وبناء المدونة اللغوية العربية لمدينة الملك عبدالعزيز للعلوم والتقنية (المدونة العربية)، والسمات التي تميزها عن غيرها من المدونات العربية الموجودة حتى الآن، مع استعراض سريع لأدوات الموقع المتوفرة حاليا، وتلك التي ستتوفر في الموقع الجديد بحول الله. أما المحور الثالث والأخير فيتعلق ببعض البرامج والأدوات التي طورت بالكامل في مدينة الملك عبدالعزيز للعلوم والتقنية أوتم تسهيل عملية استخدامها لغير المختصين لتكون منظومة كاملة قدر الإمكان لمعالجة المدونات اللغوية العربية حسب حاجة المستخدم مع التركيز بشكل رئيس على أهم هذه البرامج وهو نظام " غواص".
الكامل في اتفاق الصحابة والأئمة علي الخِمار وتحريم إظهار المرأة لشئ من جسدها ...MaymonSalim
سلسلة الكامل / كتاب رقم ( 166 ) / ( الكامل في اتفاق الصحابة والأئمة علي الخِمار وتحريم إظهار المرأة لشئ من جسدها سوي الوجه والكفين علي الأكثر مع ذِكر ( 100 ) صحابي وإمام منهم وكشف جهالة الحدثاء الأغرار ) ، لمؤلفه د/ عامر الحسيني
يقوم كلا منا بالإستعداد وتجهيز و تسخير كل ما يملك لخوض إمتحانات حياته الهامة
قد سخّر الله كل ما قد نحتاجه لتجاوز إمتحان الحياة الدنيا و العبور به إلى الجنة
دعنا نراجع هذا معا
الإجابة على أسئلة الملاحدة حول الغاية من الخلقربيع أحمد
الإجابة على أسئلة الملاحدة حول الغاية من الخلق هذا المقال يمكن تحميل من أحد هذين الرابطين :
http://www.alukah.net/sharia/0/82630/
http://www.ahlalhdeeth.com/vb/showthread.php?t=346848
Automatic text simplification evaluation aspectsiwan_rg
The document discusses automatic text simplification (ATS) and methods for evaluating ATS systems. It provides an overview of common evaluation metrics like BLEU, SARI, FKGL, and SAMSA and compares their abilities to measure simplicity, meaning preservation, and grammaticality. The document also outlines a proposed project to build a corpus and develop a graded reading scale to guide the simplification of Arabic fiction works for educational purposes.
Building theoretical models using structured equation modelingiwan_rg
This document provides an overview of structural equation modeling (SEM) and how it can be used to build and assess theoretical models. It discusses what SEM is, how to develop a theoretical model based on existing literature, and the methodology for validating a model using SEM. This involves defining the research problem, collecting data, performing exploratory factor analysis and confirmatory factor analysis to validate measurements, developing a path diagram as the structural model, and assessing overall model fit. An example is provided of using SEM to test a theoretical model of factors influencing the diffusion and use of social mobile games.
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
This document provides an introduction to word embeddings in deep learning. It defines word embeddings as vectors of real numbers that represent words, where similar words have similar vector representations. Word embeddings are needed because they allow words to be treated as numeric inputs for machine learning algorithms. The document outlines different types of word embeddings, including frequency-based methods like count vectors and co-occurrence matrices, and prediction-based methods like CBOW and skip-gram models from Word2Vec. It also discusses tools for generating word embeddings like Word2Vec, GloVe, and fastText. Finally, it provides a tutorial on implementing Word2Vec in Python using Gensim.
Introduction to Arabic natural language processing (Infographics)iwan_rg
This document provides an overview of Arabic natural language processing (NLP). It discusses the key aspects of Arabic including its alphabet, script, phonology, orthography, morphology, syntax, and semantics. For each of these areas, it describes some of the core linguistic concepts and challenges for Arabic NLP. It also lists several common NLP tasks for each area such as morphological analysis, syntactic parsing, and information extraction.
CHOOSING RESEARCH TOPICS AND WRITING RESEARCH PAPERSiwan_rg
This document provides guidance on choosing research topics and writing research papers. It discusses identifying research problems, choosing a topic, the types of research papers (survey, standard, letters), and the anatomy of a research paper. It also covers the peer review process, choosing publication venues, and reputable journals. The overall document serves as a guide for students and researchers on conducting research and publishing their work.