Sketch Engine is a web-based tool for analyzing corpora. It allows users to generate word sketches, view concordances, find similar words using the thesaurus, and compare the behavior of words. Key functions include the concordancer, word lists, word sketches, thesaurus, and word sketch difference. Users can analyze pre-loaded corpora or upload their own texts for tokenization, lemmatization, and POS tagging to build custom corpora.
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
Lecture on corpus annotation for corpus linguistics. Contents: DIY corpus, e-texts, character set and text encoding issues, document structure, DTDs, documentation;
tools and issues in annotation procedures, good practices; examples from anaphora resolution and named entity recognition annotation campaigns; evaluation of corpus annotation
Machine translation is an easy tool for translating text from one language to another. You've probably used it. But do you know what machine translation really is? Or when you should or shouldn't use it? Navigate through this presentation to learn more!
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
Lecture on corpus annotation for corpus linguistics. Contents: DIY corpus, e-texts, character set and text encoding issues, document structure, DTDs, documentation;
tools and issues in annotation procedures, good practices; examples from anaphora resolution and named entity recognition annotation campaigns; evaluation of corpus annotation
Machine translation is an easy tool for translating text from one language to another. You've probably used it. But do you know what machine translation really is? Or when you should or shouldn't use it? Navigate through this presentation to learn more!
Task based syllabus based on Krahnke's (1987) book: "Approaches to Syllabus Design for Foreign Language
Teaching. Language in Education: Theory and Practice"
Task based syllabus based on Krahnke's (1987) book: "Approaches to Syllabus Design for Foreign Language
Teaching. Language in Education: Theory and Practice"
Regular Expressions(Theory of programming languages))khudabux1998
Regular expressions (regex) are powerful tools used for pattern matching and text manipulation. They are essential in many programming and data processing tasks for searching, editing, and validating strings.
Parsers. We might not think about them but anyone who writes code uses parsers every day. And the best part, they are useful not only for compiler design but for implementing other things like custom search queries, DSLs, parsing log files and data.
Writing parsers, a prerequisite for implementation of such features, might seem scary at first (it seemed to me at first!), but in reality, writing parsers is not that complicated.
In this talk, I will explain a bit of theory behind parsers, show how they can be written by hand or with tools such as ANTLR.
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
1. Hello ANTLR: ANother Tool for Language Recognition
2. Where we can use ANTLR?
3. Why just not use regular expression language?
4. Tools under ANTLR umbrella
5. ANTLR basic syntax
6. ANTLR on real example
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
The paper accepted on ICSE'17 and TSE'19. https://se-thesaurus.appspot.com/ https://pypi.org/project/DomainThesaurus/ Informal discussions on social platforms (e.g., Stack Overflow) accumulates a large body of programming knowledge in natural language text. Natural language process (NLP) techniques can be exploited to harvest this knowledge base for software engineering tasks. To make an effective use of NLP techniques, consistent vocabulary is essential. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms in informal discussions, such as abbreviations, synonyms and misspellings. Existing techniques to deal with such morphological forms are either designed for general English or predominantly rely on domain-specific lexical rules. A thesaurus of software-specific terms and commonlyused morphological forms is desirable for normalizing software engineering text, but very difficult to build manually. In this work, we propose an automatic approach to build such a thesaurus. Our approach identifies software-specific terms by contrasting software-specific and general corpuses, and infers morphological forms of software-specific terms by combining distributed word semantics, domain-specific lexical rules and transformations, and graph analysis of morphological relations. We evaluate the coverage and accuracy of the resulting thesaurus against community-curated lists of software-specific terms, abbreviations and synonyms. We also manually examine the correctness of the identified abbreviations and synonyms in our thesaurus. We demonstrate the usefulness of our thesaurus in a case study of normalizing questions from Stack Overflow and CodeProject.
Fusing Modeling and Programming into Language-Oriented ProgrammingMarkus Voelter
Modeling in general is of course different from programming (think: climate models). However, when we consider the role of models in the context of “model-driven”, i.e., when they are used to automati- cally construct software, it is much less clear that modeling is different from programming. In this paper, I argue that the two are conceptually indistinguishable, even though in practice they traditionally emphasize different aspects of the (conceptually indistinguishable) common approach. The paper discusses and illustrates language-oriented programming, the approach to {modeling| programming} we have successfully used over the last 7 years to build a range of innovative systems in domains such as insurance, healthcare, tax, engineering and consumer electronics. It relies on domain-specific languages, modular language extension, mixed notations, and in particular, the Jetbrains MPS language workbench.
A Brief Overview of (Static) Program Query LanguagesKim Mens
A brief introduction to some Program Query Languages and tools, part of a larger course on Programming Paradigms, taught at UCLouvain university in Belgium by Prof. Kim Mens.
Towards an RDF Validation Language based on Regular Expression DerivativesJose Emilio Labra Gayo
Towards an RDF Validation Language based on Regular Expression Derivatives
Author: Jose Emilio Labra Gayo
Slides presented at: Linked Web Data Management Workshop
Brussels, 27th March, 2015
My 10 favorite Haxe language features - Francis Bourre - Codemotion Rome 2017Codemotion
The Haxe Programming Language was designed to be simple and powerful. Its toolkit provides an incredibly fast cross-compiler (JS, PHP, C++, Java, C#, Python…). In this lightning talk, Francis will introduce the grammar, and will list the features that he likes the most. As you can imagine, the goal of this presentation is not to showcase a non-exhaustive list of features.This will be more a quick and deep introduction about this open-source platform, to share with everyone love and power, and maybe, who knows. This will make you want to get started.
P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis iwan_rg
By:
Muhammad Abdul-Mageed, Hassan Alhuzali, Dua'a Abu-Elhij'a and Mona Diab
Abstract
Although there has been a surge of research on sentiment analysis, less work has been done on the related task of emotion detection. Especially for the Arabic language, there is no literature that we know of for the computational treatment of emotion. This situation is due partially to lack of labelled data, a bottleneck that we seek to ease. In this work, we report efforts to acquire and annotate a multi-dialect dataset for Arabic emotion analysis.
P03- MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization iwan_rg
By:
Ossama Obeid, Houda Bouamor, Wajdi Zaghouani, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab and Kemal Oflazer
Abstract
In this paper, we introduce MANDIAC, a web-based annotation system designed for rapid manual diacritization of Standard Arabic text. To expedite the annotation process, the system provides annotators with a choice of automatically generated diacritization possibilities for each word. Our framework provides intuitive interfaces for annotating text and managing the diacritization annotation process. In this paper we describe the annotation and the administration interfaces as well as the back-end engine. Finally, we demonstrate that our system doubles the annotation speed compared to using a regular text editor.
P04- Toward an Arabic Punctuated Corpus: Annotation Guidelines and Evaluation iwan_rg
By:
Wajdi Zaghouani and Dana Awad
Abstract
We present our effort to build a large scale punctuated corpus for Arabic. We illustrate in details our punctuation annotation guidelines designed to improve the annotation work flow and the inter-annotator agreement. We summarize the guidelines created, discuss the annotation framework and show the Arabic punctuation peculiarities. Our guidelines were used by trained annotators and regular inter-annotator agreement measures were performed to ensure the annotation quality. We highlight the main difficulties related to the Arabic punctuation annotation that arose during this project.
P02- Towards a New Arabic Corpus of Dyslexic Textsiwan_rg
By:
Maha Alamri and William John Teahan
Abstract
This paper presents a detailed account of the preliminary work for the creation of a new Arabic corpus of dyslexic text. The analysis of errors found in the corpus revealed that there are four types of spelling errors made as a result of dyslexia in addition to four common spelling errors. The subsequent aim was to develop a spellchecker capable of automatically correcting the spelling mistakes of dyslexic writers in Arabic texts using statistical techniques. The purpose was to provide a tool to assist Arabic dyslexic writers. Some initial success was achieved in the automatic correction of dyslexic errors in Arabic text.
P01- Toward a rich Arabic Speech Parallel Corpus for Algerian sub-Dialects iwan_rg
By:
Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi, Abdallah Lakhdari and Aicha Chorana
Abstract
Speech datasets and corpora are crucial for both developing and evaluating accurate Natural Language Processing systems. While Modern Standard Arabic has received more attention, dialects are drastically underestimated, even they are the most used in our daily life and the social media, recently. In this paper, we present the methodology of building an Arabic Speech Corpus for Algerian dialects, and the preliminary version of that dataset of dialectal Arabic speeches uttered by Algerian native speakers selected from different Algeria’s departments. In fact, by means of a direct recording way, we have taken into account numerous aspects that foster the richness of the corpus and that provide a representation of phonetic, prosodic and orthographic varieties of Algerian dialects. Among these considerations, we have designed a rich speech topics and content. The annotations provided are some useful information related to the speakers, time-aligned orthographic word transcription. Many potential uses can be considered such as speaker/dialect identification and computational linguistic for Algerian sub-dialects. In its preliminary version, our corpus encompasses 17 sub-dialects with 109 speakers and more than 6 K utterances.
Keynote - Computational Processing of Arabic Dialects: Challenges, Advances a...iwan_rg
By:
Nizar Habash
Abstract
The Arabic language consists of a number of variants among which Modern Standard Arabic (MSA) has a special status as the formal, mostly written, standard of the media, culture and education across the Arab World. The other variants are informal, mostly spoken, dialects that are the languages of communication of daily life. Most of the natural language processing resources and research in Arabic have focused on MSA. However, recently, more and more research is targeting Arabic dialects. In this talk, we present the main challenges of processing Arabic dialects, and discuss common solution paradigms, current advances, and future directions.
رغم أن وجود المدونات اللغوية والأدوات الحاسوبية التي تسهل استخدامها في الدراسة اللغوية ليس أمرا جديدا، إلا أن الجهود العربية الخالصة التي تمت بخصوص بناء المدونات وأدوات معالجتها لازالت في بداياتها. والهدف من هذه المحاضرة هو تقديم لمحة عامة عن هذا الموضوع، ويمكن تلخيصها في ثلاثة محاور رئيسية. المحور الأول يقدم استعراضا موجزا لمعايير تصميم المدونات بحيث تكون متوازنة وممثلة للغرض الذي أنشئت من أجله، بالإضافة إلى المعلومات الأساسية التي يجب أن تتوفر بصورة واضحة عن نصوصها. أما المحور الثاني فيتعلق بتصميم وبناء المدونة اللغوية العربية لمدينة الملك عبدالعزيز للعلوم والتقنية (المدونة العربية)، والسمات التي تميزها عن غيرها من المدونات العربية الموجودة حتى الآن، مع استعراض سريع لأدوات الموقع المتوفرة حاليا، وتلك التي ستتوفر في الموقع الجديد بحول الله. أما المحور الثالث والأخير فيتعلق ببعض البرامج والأدوات التي طورت بالكامل في مدينة الملك عبدالعزيز للعلوم والتقنية أوتم تسهيل عملية استخدامها لغير المختصين لتكون منظومة كاملة قدر الإمكان لمعالجة المدونات اللغوية العربية حسب حاجة المستخدم مع التركيز بشكل رئيس على أهم هذه البرامج وهو نظام " غواص".
د.سلطان بن ناصر بن عبد الله المجيول، دكتوراه (لسانيات المدونات الحاسوبية وعلم اللغة التطبيقي، جامعة إكسيتر، بريطانيا، 1434هـ)، وماجستير (اللغة والنحو، تخصص: علم اللغة الاجتماعي والمصطلحات، جامعة الملك سعود، 1427هـ)، ودبلوم عالي (علم اللغة التطبيقي، جامعة الملك سعود، 1426هـ) وبكالوريوس (اللغة العربية، جامعة الملك سعود، 1424هـ).
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
6. What is Sketch Engine ?
It is a corpus query tool which takes as input a corpus of any
language and a corresponding grammar patterns, and which
generates, amongst other things, word sketches for the words
of that language.
The Sketch Engine is designed for anyone wanting to research
how words behave.
6
SkE
Corpus
Word Sketches
7. What is Sketch Engine ?
7
Upload
your own
corpus
Access to
public
corpora
Advanced
search
options
8. Sketch Engine Features
1
• Web based tool – No installation
2
• Support Arabic corpora
3
• The Concordancer with advanced options
4
• The Word Sketches
8
9. Sketch Engine Features
5
• The Thesaurus (find similar words)
6
• Support for parallel corpora, virtual sub- and
super corpora
7
• Full regular-expression searching using CQL
8
• Corpus Architect: user corpora, uploaded by
users or created by WebBootCaT
9
10. Who Use Sketch Engine ?
10
Language
learners
WritersLinguists
Researchers
15. Steps to create a Corpus in SKE
15
Word Sketches
Sketch Diff
Thesaurus
Raw text
Tokenizati
on
Lemmatiz
ation
POS
tagging
Sketch
Grammar
SKE
Features
16. 16
1- Upload your text:
- Sketch engine accepts types of files such as (.xml .doc, .docx, .htm,
.html, .pdf,.txt, …)
17. 17
2- Tokenization:
- The process of splitting words and adding structure tags
(<s>,<doc>,<p>).
- The output will be a vertical line file
19. 19
4- POS tagging:(mandatory for word sketch)
- The process of attaching a word with its part-of-speech tag.
- SKE Arabic tagger is not avaliable.
•
V
•
PN
•
N
20. 20
5- uploading Sketch Grammar:
- A file describing the grammatical relations in a langauge.
Example: 1: ”V” “(DET|NUM|ADJ|ADV|N)”* 2:”N”
26. Concordance
What is Concordancer?
A concordancer looks through the
whole corpus and finds every
example of a particular word or
phrase, then displays it with its
immediate context.
.
.
26
33. Concordance
Query'sTypes
Word Will match any word form exactly.
+you can select the PoS (Not for Arabic corpus)
+you can select "match case“ (Not for Arabic corpus)
« »« »
33
36. The general form is: [attr="value"]
o«»
“Match any character“ operator: *
o«...»
Or , And operators: | , &:
o«»«»
36
Concordance
Corpus Query Language (Basics)
37. “Match any token" operator: []
o«..»«»
Specifying number of tokens operator: {}
o«..»«»
o«..»0-3
«»
37
Concordance
Corpus Query Language (Basics)
45. Here you can:
Select a sub-corpus or
Create a new sub-corpus from a subset
of the current corpus
You can also select constraints on the
text types for documents that will be
searched for your query
45
Concordance
TextTypes
51. WordList
What is theWord List?
Word List: for obtaining word lists ranked by
frequency for an entire corpus, or a
specified sub-corpus
It can be useful for investigating whether a
word is used most frequently in its verb or
noun form, for instance.
51
52. 52
Input: RE pattern or any
attribute (word, tag, lemma…)
Word List
Output:
Filtered list of lemma and/
words with frequencies
55. Choose lemma at Search attribute
Type the lemma (e.g. ) into
the RE pattern box.
Tick the box that says change
output attribute(s).
In the first two levels, select
“lemma" and "Tag".
55
61. WordSketch
What isWord Sketch?
Word Sketch: this allows you to explore the
grammatical and collocational behaviour of
a word.
The Word Sketch function doesn’t just tell
you what words are commonly found in the
company of your search word, but also tells
you what their grammatical relationship is
to the search word.
61
67. Thesaurus
What isThesaurus?
Thesaurus: this allows you to find other
words that have similar grammatical and
collocational behaviour to a given word.
Note that this thesaurus is produced
automatically from statistics on word co-
occurrences.
It is not a manually constructed thesaurus and
will list words for each entry which are
distributionally related but not necessarily
synonyms.
67
73. Sketch-Diff
What isWord Sketch Difference?
Sketch-Diff: this allows you to compare the
behavior of two words
This function is also very useful for
comparing/deciding between two possible
translations of an item.
73
74. 74
Input: two words or
lemmas
Sketch-Diff
Output: the different and
common collocations of
the two lemmas.
It is a corpus query tool which takes as input a corpus of any language (with an appropriate level of linguistic mark-up) and a corresponding grammar patterns, and which generates, amongst other things, word sketches for the words of that language.Those other things include a corpus-based thesaurus and ‘sketch differences’, which specify, for two semantically related words, what behaviour they share and how they differ. We anticipate that sketch differences will be particularly useful for lexicographers interested in near synonym differentiation.Word sketches were first used in the production of the Macmillan English Dictionary (Rundell 2002) and were presented at Euralex 2002 (Kilgarriff and Rundell 2002). Following that presentation, the most-asked question was “can I have them for my language?” In response, we have now developed the Sketch Engine.
It is a corpus query tool which takes as input a corpus of any language (with an appropriate level of linguistic mark-up) and a corresponding grammar patterns, and which generates, amongst other things, word sketches for the words of that language.Those other things include a corpus-based thesaurus and ‘sketch differences’, which specify, for two semantically related words, what behaviour they share and how they differ. We anticipate that sketch differences will be particularly useful for lexicographers interested in near synonym differentiation.Word sketches were first used in the production of the Macmillan English Dictionary (Rundell 2002) and were presented at Euralex 2002 (Kilgarriff and Rundell 2002). Following that presentation, the most-asked question was “can I have them for my language?” In response, we have now developed the Sketch Engine.
The Sketch Engine has a number of language-analysis functions, the core ones being:the Concordancer A program which displays all occurrences from the corpus for a given query. The program is very powerful with a wide variety of query types and many different ways of displaying and organising the results. (concordancing, sorting, sampling, wordlists, collocation lists)the Word Sketch program This program provides a corpus-based summary of a word's grammatical and collocationalbehaviour.
With Corpus Architect, you can build your own corpora from documents in various format: TXT, PDF, PS, DOC, HTML, VERT. When processed, you can search and query them within Sketch Engine.
With Corpus Architect, you can build your own corpora from documents in various format: TXT, PDF, PS, DOC, HTML, VERT. When processed, you can search and query them within Sketch Engine.
With Corpus Architect, you can build your own corpora from documents in various format: TXT, PDF, PS, DOC, HTML, VERT. When processed, you can search and query them within Sketch Engine.
Concordance: for querying a corpus and obtaining concordances which you can then further refine, filter and use for generating frequency information and collocation listsWord List: for obtaining word lists for an entire corpus, or a specified subcorpusWord Sketch: this allows you to explore the grammatical and collocational behaviour of a word.Thesaurus: this allows you to find other words that have similar grammatical and collocational behaviour to a given word. Note that this thesaurus is produced automatically from statistics on word co-occurrences. It is not a manually constructed thesaurus and will list words for each entry which are distributionally related but not necessarily synonyms.Sketch-Diff: this allows you to compare the behaviour of two words
Main Sketch Engine Links:https://www.sketchengine.co.uk/documentation/wiki/SkE/Help/MainLinkHelp
Concordance Query:https://www.sketchengine.co.uk/documentation/wiki/SkE/Help/PageSpecificHelp/ConcordanceQueryQuery Types: Using Query Type, you can refine the type of query you wish to make in the main panel.Context : If Context is selected in the LHS menu, on the main panel you can specify criteria on the context for your query. You can choose to specify the context in terms of surrounding lemma(s) and/or PoS tag(s).Text Types: Here you can select a subcorpus or create a new subcorpus from a subset of the current corpus. You can also stipulate constraints on the text types for documents that will be searched for your query
Ex1:Lemma filter:Window: right, 1 tokensLemma(s): عن none
Concordance Menu options:https://www.sketchengine.co.uk/documentation/wiki/SkE/Help/PageSpecificHelp/Concordance Menu optionsNote that the options in the left hand side panel are all available when you are viewing the concordance. Some of the options will not be shown if you have already selected from this menu. If so, you can click view concordance to get back to the concordance.View OptionsClicking on View Options will allow you to alter how the concordance looksWith this you can select what attributes of the words in the concordance you seeKWIC/Sentence Toggle betweenthe KWIC mode where the queried text (node) is in a central column and context is displayed on either sideSentence where the queried text (node) is provided in the context of the sentence in which it occursSave Click on this to see options for saving the concordance in the main panel (or the frequency list or collocation candidates).Sort Click on this to see complex sorting options. If the concordance is sorted based on the context, an option to"Jump to" a page with context starting with a certain letter occurs.Alternatively, you can click onLeft (Right): to sort by the text left (Right) of the nodeNode: to sort by the text in the central column (referred to as the node or KWIC)References: to sort by the document references at the left hand side of the concordanceShuffle: the concordance will be jumbled to avoid bias from a user only looking at the first portionSample Click this to select a random sample of the concordance linesFilter Click this to further specify contextual features to filter the concordance, for example by words to the left or right of the node word, or by text typeFrequency Click on this to see a variety of complex methods for obtaining frequency listsAlternatively, you can click onNode tags: to get a frequency list over the part of speech tags of the node word/s in the central columnNode forms: to get a frequency list over the node word forms in the central columnDoc IDs: to get a frequency list over the Doc ID's for the node word/s in the central columnText Types: to get a frequency list over all the text types of the node word/s in the central columnCollocations Click on this to specify criteria and build collocation lists for the node word/s in the central columnConcDesc You can see the query in detail (for technical people) and you can go back in the history if the query consists of several subsequent actions.Visualize This link will show you the distributional graph of the concordance within the corpus. On x-axis there are concordance positions (by default 100 columns for 100 slices of the corpus, you may change its granularity with the slider + click on Redraw button), on y-axis there is a relative frequency of the query hits within a concordance part (=column). Columns are clickable: by clicking on a column, you will filter the concordance and will see only the appropriate concordance part.
Word List Options:Left hand side options:select All words to generate a list of words in the corpus ranked by frequencyselect All lemmas to generate a list of lemmas in the corpus ranked by frequency. Lemma is the base (stem) form of a word.In the main panel of the interface you have further options:Subcorpus: where you can specify a subcorpus for the source data, or create a new one.Search Attribute: you can specify word, lemma, tag (part of speech tag) etc.. depending on the attributes defined for the corpus or you can specify one of the text types defined for the corpus. The default attribute is word.Filter Options: You can either do this for all words (or lemmas or whichever attribute you specify) or you can filter the list.Output Options:You can select different types of the produced list.
Choose a corpus and click on Word List in the left hand side menu.Choose lemma at Search attributeType the lemma (e.g. حار) into the RE pattern box. Tick the box that says change output attribute(s).In the first two levels, select “lemma" and "Tag".Click on Make Word List.