This document provides an introduction to regular expressions and regular languages. It defines the key operations used in regular expressions: union, concatenation, and Kleene star. It explains how regular expressions can be converted into finite state automata and vice versa. Examples of regular expressions are provided. The document also defines regular languages as those languages that can be accepted by a deterministic finite automaton. It introduces the pumping lemma as a way to determine if a language is not regular. Finally, it includes some practical activities for readers to practice converting regular expressions to automata and writing regular expressions.
A short presentation to share knowledge about topic Decidability of Theory of Automata Course.
To make people to be aware how to know which formal languages are decidable and why...!
It is related to Analysis and Design Of Algorithms Subject.Basically it describe basic of topological sorting, it's algorithm and step by step process to solve the example of topological sort.
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDAAshish Duggal
Push Down Automata (PDA) is part of TOC (Theory of Computation)
From this presentation you will get all the information related to PDA also it will help you to easily understand this topic. There is also one example.
This PPT is very helpful for Computer science and Computer Engineer
(B.C.A., M.C.A., B.TECH. , M.TECH.)
A short presentation to share knowledge about topic Decidability of Theory of Automata Course.
To make people to be aware how to know which formal languages are decidable and why...!
It is related to Analysis and Design Of Algorithms Subject.Basically it describe basic of topological sorting, it's algorithm and step by step process to solve the example of topological sort.
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDAAshish Duggal
Push Down Automata (PDA) is part of TOC (Theory of Computation)
From this presentation you will get all the information related to PDA also it will help you to easily understand this topic. There is also one example.
This PPT is very helpful for Computer science and Computer Engineer
(B.C.A., M.C.A., B.TECH. , M.TECH.)
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenient because their creation is fast and inexpensive. Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. Less attention has been paid to assess the representativeness of specialized or domain-specific web corpora. In this paper, we focus on the assessment of domain representativeness of web corpora and we claim that it is possible to assess the degree of domainspecificity, or domainhood, of web corpora. We present a case study where we explore the effectiveness of different measures - namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficient, Kullback– Leibler divergence, log-likelihood and burstiness - to gauge domainhood. Our findings indicate that burstiness is the most suitable measure to single out domain-specific words from a specialized corpus and to allow for the quantification of domainhood.
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
In this study, we focus on the creation and evaluation of domain-specific web corpora. To this purpose, we propose a two-step approach, namely the (1) the automatic extraction and evaluation of term seeds from personas and use cases/scenarios; (2) the creation and evaluation of domain-specific web corpora bootstrapped with term seeds automatically extracted in step 1. Results are encouraging and show that: (1) it is possible to create a fairly accurate term extractor for relatively short narratives; (2) it is straightforward to evaluate a quality such as domain-specificity of web corpora using well-established metrics.
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
In this study, we put forward two claims: 1) it is possible to design a dynamic and extensible corpus without running the risk of getting into scalability problems; 2) it is possible to devise noise-resistant Language Technology applications without affecting performance. To support our claims, we describe the design, construction and limitations of a very specialized medical web corpus, called eCare_Sv_01, and we present two experiments on lay-specialized text classification. eCare_Sv_01 is a small corpus of web documents written in Swedish. The corpus contains documents about chronic diseases. The sublanguage used in each document has been labelled as "lay" or "specialized" by a lay annotator. The corpus is designed as a flexible text resource, where additional medical documents will be appended over time. Experiments show that the layspecialized labels assigned by the lay annotator are reliably learned by standard classifiers. More specifically, Experiment 1 shows that scalability is not an issue when increasing the size of the datasets to be learned from 156 up to 801 documents. Experiment 2 shows that lay-specialized labels can be learned regardless of the large amount of disturbing factors, such as machine translated documents or low-quality texts, which are numerous in the corpus.
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
We present a preliminary study that explores whether text features used for readability assessment are reliable genre-revealing features. We empirically explore the difference between genre and domain. We carry out two sets of experiments with both supervised and unsupervised methods. Findings on the Swedish national corpus (the SUC) show that readability cues are good indicators of genre variation.
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology
word sense disambiguation, wsd, thesaurus-based methods, dictionary-based methods, supervised methods, lesk algorithm, michael lesk, simplified lesk, corpus lesk, graph-based methods, word similarity, word relatedness, path-based similarity, information content, surprisal, resnik method, lin method, elesk, extended lesk, semcor, collocational features, bag-of-words features, the window, lexical semantics, computational semantics, semantic analysis in language technology.
inferential statistics, statistical inference, language technology, interval estimation, confidence interval, standard error, confidence level, z critical value, confidence interval for proportion, confidence interval for the mean, multiplier,
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Lecture: Regular Expressions and Regular Languages
1. Regular Expressions
& Regular Languages
slideshare: http://www.slideshare.net/marinasantini1/regular-expressions-and-regular-languages
Mathematics for Language Technology
http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/
Last Updated 6 March 2015
Marina Santini
santinim@stp.lingfil.uu.se
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Spring 2015
1
6. 6
Regular Expressions and Text Searching
Everybody does it
Emacs, vi, perl, grep, etc..
Regular expressions are a compact
textual representation of a set of strings
representing a language.
8. 8
Errors
The process we just went through was
based on two fixing kinds of errors
Matching strings that we should not have
matched (there, then, other)
• False positives (Type I)
Not matching things that we should have
matched (The)
• False negatives (Type II)
9. 9
Errors
Reducing the error rate for an application
often involves two antagonistic efforts:
Increasing accuracy, or precision, (minimizing
false positives)
Increasing coverage, or recall, (minimizing
false negatives).
10. 10
REs: What are they?
Regular expressions describe
languages by an algebra.
18. Union ∪ (aka: disjunction, OR, |, +)
The union of languages is the usual
thing, since languages are sets.
Example: {01,111,10}∪{00, 01} =
{01,111,10,00}.
18
01 happens to be in both
sets, so it will be once in the
union
19. 19
Concatenation: represented by juxtaposition (no punctuation)
or middle dot ( · )
The concatenation of languages
L and M is denoted LM.
It contains every string wx such
that w is in L and x is in M.
Example: {01,111,10}{00, 01}
= {0100, 0101, 11100, 11101,
1000, 1001}. In the example, we take 01 from the first language,
and we concatenate it with 00 in the second language.
That gives us 0100.
We then take 01 from the first language again, and we
concatenate it with 01 in the second language, and that
gives us 0101.
Then we take 111 from the first language and we
concatenated it with 00 in the second language and
this gives us 11100
…. and so on.
20. 20
Kleene Star: represented by an asterisk
aka star (*)
If L is a language, then L*, the Kleene
star or just “star,” is the set of strings
formed by concatenating zero or more
strings from L, in any order.
L* = {ε} ∪ L ∪ LL ∪ LLL ∪ …
Example: {0,10}* = {ε, 0, 10, 00, 010,
100, 1010,…}
If you take no strings from L, that would give you the empty string.
21. IMPORTANT!
FROM NOW ON, LET’S STICK TO THE
FOLLOWING CONVENTIONS (OTHERWISE WE
WILL BE CONFUSED):
Union ∪ (aka: disjunction, OR) represented by: | or +
Concatenation: represented by juxtaposition (= no
punctuation) or middle dot ( · )
Kleene Star: represented by *
21
22. 22
Precedence of Operators
Parentheses may be used wherever
needed to influence the grouping of
operators.
Order of precedence is * (highest), then
concatenation, then + (lowest).
Remember: + = union/disjunction
23. 23
Examples: REs
1. L(01) = {01}.
2. L(01+0) = {01, 0}.
3. L(0(1+0)) = {01, 00}.
Note order of precedence of
operators.
4. L(0*) = {ε, 0, 00, 000,… }.
5. L((0+10)*(ε+1)) = all strings
of 0s and 1s without two
consecutive 1s.
1) The regular expression 01 represents the
concatenation of the language consisting of one
string, 0 and the language consisting of one string, 1.
The result is the language containing the one string
01.
2) The language of 01+0 is the union of the language
containing only string 01 and the language containing
only string 0.
3) The language of 0 concatenated with 1+0 is the
two strings 01 and 00. Notice that we need
parentheses to force the + to group first. Without
them, since concatenation takes precedence over +,
we get the interpretation in the second example.
4) The language of 0* is the star of the language
containing only the string 0. This is all strings of 0’s,
including the empty string.
5) This example denotes the language with all strings
of 0s and 1s without two consecutive 0s. To see why
this works, in every such string, each 1 is either
followed immediately by a 0, or it comes at the end of
the string. (0+10)* denotes all strings in which every
1 is followed by a 0. These strings are surely in the
language we want. But we also want these strings
followed by a final 1. Thus, we concatenate the
language of (0+10)* with epsilon+1. This
concatenation gives us all the strings where 1s are
followed by 0s, plus all those strings with an
additional 1 at the end.
24. 24
Equivalence of REs and Finite
Automata
For every RE, there is a finite automaton
that accepts the same language.
And we need to show that for every finite
automaton, there is a RE defining its
language.
28. 28
Regular Languages
A language L is regular if it is the
language accepted by some DFA.
Note: the DFA must accept only the strings
in L, no others.
Some languages are not regular.
30. Regular language derive their name from the fact that the
strings they recognize are (in a formal computer science sense)
“regular.”
This implies that there are certain kinds of strings that it will be
very hard, if not impossible, to recognize with regular
expressions, especially nested syntactic structures in natural
language.
30
31. Formal languages vs regular
languages
A formal language is a set of strings,
each string composed of symbols from
a finite set called an alphabet.
Ex: {a,b!}
Formal languages are not the same as
regular languages….
31
32. 32
But Many Languages are Regular
They appear in many contexts and have
many useful properties.
33. How to tell if a language is not regular
The most common way to prove that a
language is regular is to build a regular
expression for the language.
33
35. Prac6cal
Ac6vity
1
The
language
L
contains
all
strings
over
the
alphabet
{a,b}
that
begin
with
a
and
end
with
b,
ie:
Write a regular expression that defines
the language L.
35
37. Your Solutions
37
In between the concatenation of a
and b there must be 0 or more
unions (disjuctions) of a and b.
Reference: slides 17-22
38. Practical Activity 2
Draw a deterministic finite-state automaton
that accepts the following regular expression:
38
( (ab) | c)*
Alternative notation style:
ie: 0 or more occurences of
the disjunction ab | c
Test the
automaton with
these legal strings
in the language :
0
abc
a
ab
cccabc
cbacccabababccc
….
39. Practical Activity 2:
Possible Correct Solution
39
Having the initial state as a final state gives us the empty string as an element in the language.
40. Your solutions (1): when we interpret ”+” as
disjunction, these solutions are wrong because
”c” happens only after ”a” and ”b”…
40
Test
these
automata
with the
string on
slide 35
41. Your solutions (2): same as
previous slide. In addition, here no
final states are shown…
41
Test
these
automata
with the
string on
slide 35
42. Practical Activity 3
Construct a grep regular expression that
matches patterns containing at least one
“ab” followed by any number of bs.
Construct a grep regular expression that
matches any number between 1000 and
9999.
42
44. Exercises: E&G (2013)
Övning 9.40
Optional: as many as you can
AGer
having
completed
the
exercises,
check
out
the
solu6ons
at
the
end
of
the
book.
44