SlideShare a Scribd company logo
Requirements Engineering:
Focus on Natural Language
Requirements Processing
Alessio Ferrari1
1ISTI-CNR, Pisa, Italy
January 15, 2016
A. Ferrari (ISTI) Requirements Engineering 1 / 88
cf. https://docplayer.net/16051752-Introduction-to-programming-languages-and-
techniques-xkcd-com-full-python-tutorial.html
Objectives of the Course
Objectives: lesson 1
Understand what are requirements, specifications
and domain assertions
Understand the problems behind requirements elicitation
and SRS definition
Learn how to write a decent SRS
Objectives: lesson 2
Learn the basics of Natual Language Processing (NLP)
Learn the relation between NLP and Requirements
Learn the basics of GATE to detect NL ambiguities in
requirements
Learn the basics of Python NLTK (for further manipulation of NL
requirements)
A. Ferrari (ISTI) Requirements Engineering 2 / 88
NLP and Requirements Engineering
A. Ferrari (ISTI) Requirements Engineering 3 / 88
Natural Language Processing
Technologies enabling extraction and manipulation of information from
Natural Language (NL)
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
A. Ferrari (ISTI) Requirements Engineering 4 / 88
Natural Language Processing
Language is inherently ambiguous and hard to be handled
I Imagine a well-defined syntax and semantics for an input message:
you can write software to process it
I With NL you do not have a well defined syntax and semantics:
there is always some exception
Rule-based approaches are hard to build
Luckily, there’s machine-learning! . . . but
I :( Large amounts of data are needed for training/test/validation
I :( Tagging data requires domain expertise
I :( Requirements use domain-specific jargon (features based on
terms might work only for a specific company)
I :( Requirements structure vary from company to company (features
based on syntax might need adjustments)
I :( Some tasks (e.g., ambiguity) have too many negative cases (i.e.,
well-written), and too few positive (i.e., ambiguous)
A. Ferrari (ISTI) Requirements Engineering 5 / 88
4 Dimensions of Natural Language Requirements Processing
Discipline: requirements have to be non-ambiguous but
readable, no equivalent requirement
Dynamism: requirements evolve, and have to be traced and
categorised
Domain Knowledge: domain knowledge has to be gathered to
understand requirements, proper stakeholders have to be
identified
Data-sets: tagged data-sets are required, requirements are
private
A. Ferrari (ISTI) Requirements Engineering 6 / 88
4-D in the Requirements Process
Elicitation Analysis
ValidationManagement
Traceability
Domain Term
Identification
Stakeholder
Identification
Domain Scoping
Ambiguity
Readability
ComplianceCategorisation
Equivalence
D
I
S
C
I
P
L
I
N
E
DOMAIN KNOWLEDGE
DYNAMISM
DATA-SETS
Retrieval of Internal
Requirements
Market Analysis
A. Ferrari (ISTI) Requirements Engineering 7 / 88
Focus on Discipline
An ambiguity in a requirement might cause severe consequences
along the development
False negatives are NOT tolerable (i.e., I have to discover ALL
potential ambiguities)
It is preferable to have a certain amount of false positive cases,
which the engineer can easily discard
Rec = | true ambiguities  items retrieved | / | true ambiguities |
Prec = | true ambiguities  items retrieved | / | items retrieved |
Hence, a system for ambiguity detection has to provide
100% recall!
Rule-based approaches are preferred
However, for some tasks, rule-based approaches give really too many
false positives.
A. Ferrari (ISTI) Requirements Engineering 8 / 88
What you’ll learn
Rule-based approaches
Basic tools that you can use to implement rule-based approaches
These tools can be used as a basis if you want to move to
machine learning
A. Ferrari (ISTI) Requirements Engineering 9 / 88
NLP Minimal Concepts and Terminology
A. Ferrari (ISTI) Requirements Engineering 10 / 88
NLP Minimal Concepts
Every task in NLP needs to perform Text Normalization:
Segmenting/Tokenizing words in the text
Normalizing word formats (Stemming and Lemmatization)
Segmenting/Tokenizing sentences
Besides that, many tasks need Part-Of-Speech (POS) Tagging
A. Ferrari (ISTI) Requirements Engineering 11 / 88
NLP Minimal Concepts - Tokenization (or Word Segmentation)
A token is an occurrence of a WORD in a stream of text
Tokenization split a stream of text into a list of WORDS
Dan(Jurafsky(
How(many(words?(
they(lay(back(on(the(San(Francisco(grass(and(looked(at(the(stars(and(their(
•  Type:(an(element(of(the(vocabulary.(
•  Token:(an(instance(of(that(type(in(running(text.(
•  How(many?(
•  15(tokens((or(14)(
•  13(types((or(12)((or(11?)(
A. Ferrari (ISTI) Requirements Engineering 12 / 88
NLP Minimal Concepts - Multi-word Terms
Multi-word Terms
Example: “The wayside equipment shall respond within 5
seconds”
wayside equipment is a multi-word term like San Francisco
We do not discuss about multi-word terms, but keep them in mind
because they are frequent in requirements
A. Ferrari (ISTI) Requirements Engineering 13 / 88
NLP Minimal Concepts - Lemmatization
Dan(Jurafsky(
Lemma4za4on(
•  Reduce(inflecGons(or(variant(forms(to(base(form(
•  am,"are,(is"→(be(
•  car,"cars,"car's,(cars'(→(car"
•  the"boy's"cars"are"different"colors(→(the"boy"car"be"different"color"
•  LemmaGzaGon:(have(to(find(correct(dicGonary(headword(form(
•  Machine(translaGon(
•  Spanish(quiero((‘I(want’),(quieres((‘you(want’)(same(lemma(as(querer(‘want’(
A. Ferrari (ISTI) Requirements Engineering 14 / 88
NLP Minimal Concepts - Stemming
Dan(Jurafsky(
Stemming(
•  Reduce(terms(to(their(stems(in(informaGon(retrieval(
•  Stemming(is(crude(chopping(of(affixes(
•  language(dependent(
•  e.g.,(automate(s),+automaGc,+automaGon(all(reduced(to(automat.(
for"example"compressed""
and"compression"are"both""
accepted"as"equivalent"to""
compress.(
for(exampl(compress(and(
compress(ar(both(accept(
as(equival(to(compress(
A. Ferrari (ISTI) Requirements Engineering 15 / 88
NLP Minimal Concepts - Sentence SegmentationDan(Jurafsky(
Sentence(Segmenta4on(
•  !,(?(are(relaGvely(unambiguous(
•  Period(“.”(is(quite(ambiguous(
•  Sentence(boundary(
•  AbbreviaGons(like(Inc.(or(Dr.(
•  Numbers(like(.02%(or(4.3(
•  Build(a(binary(classifier(
•  Looks(at(a(“.”(
•  Decides(EndOfSentence/NotEndOfSentence(
•  Classifiers:(handkwri@en(rules,(regular(expressions,(or(machineklearning(
A. Ferrari (ISTI) Requirements Engineering 16 / 88
NLP Minimal Concepts - POS Tagging
Open class (lexical) words
Closed class (functional)
Nouns Verbs
Proper Common
Modals
Main
Adjectives
Adverbs
Prepositions
Particles
Determiners
Conjunctions
Pronouns
… more
… more
IBM
Italy
cat / cats
snow
see
registered
can
had
old older oldest
slowly
to with
off up
the some
and or
he its
Numbers
122,312
one
Interjections Ow Eh
A. Ferrari (ISTI) Requirements Engineering 17 / 88
NLP Minimal Concepts - POS TaggingChristopher"Manning"
POS-Tagging-
•  Words"oYen"have"more"than"one"POS:"back%
•  The"back"door"="JJ"
•  On"my"back"="NN"
•  Win"the"voters"back"="RB"
•  Promised"to"back"the"bill"="VB"
•  The"POS"tagging"problem"is"to"determine"the"POS"tag"for"a"
par1cular"instance"of"a"word."
https://www.ling.upenn.edu/courses/Fall_2003/
ling001/penn_treebank_pos.html
A. Ferrari (ISTI) Requirements Engineering 18 / 88
GATE Basic Concepts
A. Ferrari (ISTI) Requirements Engineering 19 / 88
IR vs IE
Information Retrieval (IR): pulls documents from large corpora
Information Extraction (IE): retrieves facts and structured
information from large corpora
IR returns documents containing the relevant information
somewhere (normally fast)
IE returns precise and structured information (can be slow)
GATE (General Architecture for Text Engineering) supports IE
Potential Usage
Entity, Events, Relation extraction
Enrich features for retrieval
Annotate documents for ML
Ambiguity identification with rule-based approaches
A. Ferrari (ISTI) Requirements Engineering 20 / 88
Document, Corpus, Annotation, Feature
Document = text + annotations + features
Corpus = collection of documents
Linguistic information in documents is encoded in the form of
annotations (like colored mark-ups)
Annotations have features with relative types and values
Example
Annotation: Sentence Length
Feature 1: Length in Characters, Value 1 = 100
Feature 2: Length in Token, Value 2 = 15
A. Ferrari (ISTI) Requirements Engineering 21 / 88
Document, Corpus, Annotation, Feature
A. Ferrari (ISTI) Requirements Engineering 22 / 88
Processing Resources (PR)
PR are algorithms that make NLP easy
Document Reset PR - delete Annotations
ANNIE English Tokeniser - identify tokens (words, numbers, etc.)
ANNIE Sentence Splitter - identify sentence boundaries
ANNIE Gazetteer - identify specific tokens in a list
ANNIE POS Tagger - identify part-of-speech (POS), like name,
adjective, etc.
JAPE Transducers - user-defined annotations based on regular
expressions over annotations
ANNIE collects all the algorithms and run them in PIPELINE
A. Ferrari (ISTI) Requirements Engineering 23 / 88
Using Processing Resources in GATE
Open GATE
Language Resources > New > GATE Document
Load Source Document: exercises/requirements.txt
Right-click> New Corpus from this document
Right-click>Processing Resources (then select the resources
reported in the next slides)
A. Ferrari (ISTI) Requirements Engineering 24 / 88
ANNIE English Tokenizer
Produces Token annotations
A. Ferrari (ISTI) Requirements Engineering 25 / 88
ANNIE Sentence Splitter
Produces Sentence annotations
A. Ferrari (ISTI) Requirements Engineering 26 / 88
Gazetteer
Produces Lookup annotations
A. Ferrari (ISTI) Requirements Engineering 27 / 88
ANNIE POS Tagger
Modifies the Token annotations with a new feature category = NN, JJ,
VB, etc.
A. Ferrari (ISTI) Requirements Engineering 28 / 88
JAPE Transducers
Gazetteer lists are designed for annotating simple, regular
features
Even identifying simple patterns like e-mails is impossible with a
Gazetteer
What is JAPE?
JAPE provides pattern matching in GATE
Each JAPE rule consists of:
I LHS which contains patterns to match
I RHS which details the annotations (and optionally features) to be
created
A. Ferrari (ISTI) Requirements Engineering 29 / 88
Example
I want to find all the occurrences of the term “level” followed by a
number (level 1, level 2, etc.)
A. Ferrari (ISTI) Requirements Engineering 30 / 88
Example (continue)
Making it case-insensitive
A. Ferrari (ISTI) Requirements Engineering 31 / 88
Example (continue)
Adding features and values
A. Ferrari (ISTI) Requirements Engineering 32 / 88
GATE for Ambiguity Detection
A. Ferrari (ISTI) Requirements Engineering 33 / 88
Steps - Detecting Vagueness
Open GATE
Language Resources > New > GATE Document
Load Source Document: exercises/requirements.txt
Right-click> New Corpus from this document
Copy file exercises/vagueness.lst to
/GATE_Developer_8.0/plugins/ANNIE/resources/gazetteer
Open file lists.def (here you have all the file pointers to the lists
checked by the Look-up)
Add the line: vagueness.lst:vague_term
A. Ferrari (ISTI) Requirements Engineering 34 / 88
Steps - Detecting Vagueness (continue)
Go back to GATE
Processing Resources>New>ANNIE Gazetteer, and give it a
name
Check: case sensitive = false
Click on the Gazetteer, and check the presence of vagueness.lst
Applications>ANNIE
Move the Gazetteer to the right panel
Run!
A. Ferrari (ISTI) Requirements Engineering 35 / 88
Steps - Annotating Vagueness
Processing Resources>New>Jape Transducer
Load annotate_vagueness.jape
Applications>ANNIE
Move the JAPE rule to the right panel
Run!
A. Ferrari (ISTI) Requirements Engineering 36 / 88
Coordination Ambiguities
Detect all sentences that include a potential coordination
ambiguity
Coordination ambiguities are of two types:
I Example: The system shall produce the plot or the data-log and the
legend.
I Example: New employees and directors shall be provided with a
user-name and password.
A. Ferrari (ISTI) Requirements Engineering 37 / 88
Coordination Ambiguities - Jape Rule 0
Retrieve all occurrences of And / Or
Phase: MatchAndOr
Input: Token
Options: control = appelt
//Operator ==~ returns only whole string matches
//Operator (?i) tells that the matching is case insensitive
//Operator | is the logical "or" operator
Rule: checkAndOr
Priority: 1
(
{Token.string ==~ "(?i)and"} |
{Token.string ==~ "(?i)or"}
):coord
-->
:coord.AndOr = {}
A. Ferrari (ISTI) Requirements Engineering 38 / 88
Coordination Ambiguities - Jape Rule 1
Annotate all the sequences of And/Or in the same sentence
Phase: MatchCoordinationSequences
Input: Split AndOr
Options: control = appelt
//Note that, having Split among the input Annotations allows
//us to identify sequences of And/Or in the same sentence
//The ’+’ operator means "one or more occurrences"
//The ’*’ operator means "zero or more occurrences"
//The ’?’ operator means "zero or one occurrences"
Rule: checkCoordinationSequences
Priority: 1
(
{AndOr}
({AndOr})+
):coordSequence
-->
:coordSequence.AndOrSequence = {}
A. Ferrari (ISTI) Requirements Engineering 39 / 88
Coordination Ambiguities - Jape Rule 2
Annotate all the sentences with And/Or sequences
Phase: MatchCoordinationSentences
Input: Sentence AndOrSequence
Options: control = appelt
//A "contains" B: searches for
//annotations A that contain annotations B
Rule: checkCoordinationSentences
Priority: 1
(
{Sentence contains AndOrSequence}
):coordSentence
-->
:coordSentence.AndOrSentence = {}
A. Ferrari (ISTI) Requirements Engineering 40 / 88
Your Turn! - Detect Coordination Ambiguities of the Second Type
Add the Example: new employees and directors shall be
provided with a user-name and password
Remember that ANNIE POS Tagger gives you the category of the
Token
JJ = Adjective, NN = Noun, NNS = Plural Noun
More information on the tag-set, and many other things, can be
found in: https://gate.ac.uk/sale/tao/tao.pdf
A. Ferrari (ISTI) Requirements Engineering 41 / 88
Python and NLTK
A. Ferrari (ISTI) Requirements Engineering 42 / 88
Python Basic Concepts
Python Online! http://www.learnpython.org/
Tutorial: https://docs.python.org/2/tutorial/
A. Ferrari (ISTI) Requirements Engineering 43 / 88
Python - Main Features
Interpreted language
Dynamically typed (variables do not have a predefined type!)
Rich, built-in collection types
I Lists
I Tuples
I Dictionaries
I Sets
Concise
My Humble Opinion
It is perfect if you want to develop something quickly
It is a mess if you want to develop a structured software
A. Ferrari (ISTI) Requirements Engineering 44 / 88
Which Python?
Python 2.7: that’s the one we’ll use here
I Last stable release before version 3
I Implements parts of the version 3 features
I Fully backwards compatible
Python 3.0: that’s the one you should use in the future
I Released on Dec. 3, 2008
I Many changes (also incompatible ones)
I . . . But: few important third-party libraries are not yet compatible
with Python 3.0
I Version compatible with NLTK 3.0: either 2.7, or 3.2+
Hint: if you want to use NLTK, go to:
http://www.nltk.org/install.html, and check the links
to the additional libraries and python recommended releases,
before downloading Python
A. Ferrari (ISTI) Requirements Engineering 45 / 88
Python - Example
A Code Sample (in IDLE)
x = 34 - 23 # A comment.
y = “Hello” # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
A. Ferrari (ISTI) Requirements Engineering 46 / 88
Python - Enough to Understand the Code
Indentation matters to the meaning of the code
The first assignment to a variable creates it (x = 3)
I Variable types do not need to be declared
I Python figures out the variable types on its own
Logical operators are words (and, or, not), NOT symbols
Simple printing can be done with print
Use a newline to end a line of code
Use  when must go to next line prematurely
A. Ferrari (ISTI) Requirements Engineering 47 / 88
Python - Assignment
Binding a variable in Python means setting a name to hold a
reference to some object
Hence, assignment creates references, not copies (like in Java)
An object is deleted (by the garbage collector) once it becomes
unreachable
A. Ferrari (ISTI) Requirements Engineering 48 / 88
Python - Language Features
Indentation instead of braces
Several sequence types
I Strings (immutable):
’This is a string’, and “Also this is a string”
I Tuples (immutable):
(’the’, ’system’, ’shall’, ’measure’, ’the’, ’heartbeat’)
I Lists (mutable):
[’the’, ’system’, ’shall’, ’measure’, ’the’, ’heartbeat’]
Lists and Tuples can contain anything:
I ((’REQ’, 1), ’the system shall measure the heartbeat’)
I [((’REQ’, 1), ’the system shall measure the heartbeat’), ((’REQ’, 2),
’the system shall alert the nurse’))]
A. Ferrari (ISTI) Requirements Engineering 49 / 88
Python - Sequence Types
Tuples are defined using parentheses (and commas).
tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
Lists are defined using square brackets (and commas)
li = [“abc”, 34, 4.34, 23]
Strings are defined using quotes (“, ‘, or “““)
st = “Hello World”
st = ‘Hello World’
st = “““Hello World ”””
A. Ferrari (ISTI) Requirements Engineering 50 / 88
Python - Sequence Types
We can access individual members of a tuple, list, or string using
square bracket “array” notation.
We can access individual members of a tuple, lis
using  square  bracket  “array”  notation.  
Note  that  all  are  0  based…  
>>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> tu[1] # Second item in the tuple.
‘abc’
>>> li = [“abc”, 34, 4.34, 23]
>>> li[1] # Second item in the list.
34
>>> st = “Hello  World”
>>> st[1] # Second character in string.
‘e’
A. Ferrari (ISTI) Requirements Engineering 51 / 88
Python - Negative IndicesNegative indices
>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
Positive index: count from the left, starting with 0.
>>> t[1]
‘abc’
Negative lookup: count from right, starting with –1.
>>> t[-3]
4.56
A. Ferrari (ISTI) Requirements Engineering 52 / 88
Python - Slicing
30
Slicing: Return Copy of a Subset 1
>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
Return a copy of the container with a subset of the original
members. Start copying at the first index, and stop copying
before the second index.
>>> t[1:4]
(‘abc’,  4.56,  (2,3))
You can also use negative indices when slicing.
>>> t[1:-1]
(‘abc’,  4.56,  (2,3))
Optional argument allows selection of every nth item.
>>> t[1:-1:2]
(‘abc’,  (2,3))
A. Ferrari (ISTI) Requirements Engineering 53 / 88
Python - ‘in’ OperatorThe  ‘in’  Operator
Boolean test whether a value is inside a collection (often
called a container in Python:
>>> t = [1, 2, 4, 5]
>>> 3 in t
False
>>> 4 in t
True
>>> 4 not in t
False
For strings, tests for substrings
>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True
>>> 'ac' in a
False
Be careful: the in keyword is also used in the syntax of
for loops and list comprehensions.
A. Ferrari (ISTI) Requirements Engineering 54 / 88
Python - ‘+’ Operator
34
The + Operator
The + operator produces a new tuple, list, or string whose
value is the concatenation of its arguments.
Extends concatenation from strings to other types
>>> (1, 2, 3) + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
>>> “Hello”  + “  ”  + “World”
‘Hello  World’
A. Ferrari (ISTI) Requirements Engineering 55 / 88
Lists - Mutable
36
Lists: Mutable
>>> li = [‘abc’, 23, 4.34, 23]
>>> li[1] = 45
>>> li
[‘abc’,  45,  4.34,  23]
We can change lists in place.
Name li still points to the same memory reference when
we’re  done.  
A. Ferrari (ISTI) Requirements Engineering 56 / 88
Tuples - Immutable
37
Tuples: Immutable
>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)
>>> t[2] = 3.14
Traceback (most recent call last):
File "<pyshell#75>", line 1, in -toplevel-
tu[2] = 3.14
TypeError: object doesn't support item assignment
You  can’t  change  a  tuple.  
You can make a fresh tuple and assign its reference to a previously
used name.
>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)
The  immutability  of  tuples  means  they’re  faster  than  lists.  
A. Ferrari (ISTI) Requirements Engineering 57 / 88
Operations on Lists
38
Operations on Lists Only 1
>>> li = [1, 11, 3, 4, 5]
>>> li.append(‘a’) # Note the method syntax
>>> li
[1,  11,  3,  4,  5,  ‘a’]
>>> li.insert(2,  ‘i’)
>>>li
[1,  11,  ‘i’,  3,  4,  5,  ‘a’]
A. Ferrari (ISTI) Requirements Engineering 58 / 88
Dictionaries
Dictionaries store a mapping between a set of keys and a set of
values
Keys can be any immutable type
Values can be any type
A. Ferrari (ISTI) Requirements Engineering 59 / 88
Dictionaries - Creating and AccessingCreating and accessing
dictionaries
>>> d = {‘user’:‘bozo’, ‘pswd’:1234}
>>> d[‘user’]
‘bozo’
>>> d[‘pswd’]
1234
>>> d[‘bozo’]
Traceback (innermost last):
File  ‘<interactive  input>’  line  1,  in  ?
KeyError: bozo
A. Ferrari (ISTI) Requirements Engineering 60 / 88
Dictionaries - Updating
Updating Dictionaries
>>> d = {‘user’:‘bozo’, ‘pswd’:1234}
>>> d[‘user’] = ‘clown’
>>> d
{‘user’:‘clown’,  ‘pswd’:1234}
Keys must be unique.
Assigning to an existing key replaces its value.
>>> d[‘id’] = 45
>>> d
{‘user’:‘clown’,  ‘id’:45,  ‘pswd’:1234}
Dictionaries are unordered
New entry might appear anywhere in the output.
(Dictionaries work by hashing)
A. Ferrari (ISTI) Requirements Engineering 61 / 88
Dictionaries - Removing
Removing dictionary entries
>>> d = {‘user’:‘bozo’, ‘p’:1234, ‘i’:34}
>>> del d[‘user’] # Remove one. Note that del is
# a function.
>>> d
{‘p’:1234,  ‘i’:34}
>>> d.clear() # Remove all.
>>> d
{}
>>> a=[1,2]
>>> del a[1] # (del also works on lists)
>>> a
[1]
47
A. Ferrari (ISTI) Requirements Engineering 62 / 88
Dictionaries - Useful Methods
Useful Accessor Methods
>>> d = {‘user’:‘bozo’, ‘p’:1234, ‘i’:34}
>>> d.keys() # List of current keys
[‘user’,  ‘p’,  ‘i’]
>>> d.values() # List of current values.
[‘bozo’,  1234,  34]
>>> d.items() # List of item tuples.
[(‘user’,‘bozo’),  (‘p’,1234),  (‘i’,34)]
A. Ferrari (ISTI) Requirements Engineering 63 / 88
If Statements (as expected)
if Statements (as expected)
if x == 3:
print "X equals 3."
elif x == 2:
print "X equals 2."
else:
print "X equals something else."
print "This  is  outside  the  ‘if’."
Note:
Use of indentation for blocks
Colon (:) after boolean expression
54
A. Ferrari (ISTI) Requirements Engineering 64 / 88
While Loops (as expected)
while Loops (as expected)
>>> x = 3
>>> while x < 5:
print x, "still in the loop"
x = x + 1
3 still in the loop
4 still in the loop
>>> x = 6
>>> while x < 5:
print x, "still in the loop"
>>>
55
A. Ferrari (ISTI) Requirements Engineering 65 / 88
For Loops (very cool)For Loops 1
For-each  is  Python’s  only form of for loop
A for loop steps through each of the items in a collection
type,  or  any  other  type  of  object  which  is  “iterable”
for <item> in <collection>:
<statements>
If <collection> is a list or a tuple, then the loop steps
through each element of the sequence.
If <collection> is a string, then the loop steps through each
character of the string.
for someChar in “Hello  World”:
print someChar
59
A. Ferrari (ISTI) Requirements Engineering 66 / 88
For Loops
Items can be more complex than a simple variable name
<statements>
<item> can be more complex than a single
variable name.
If the elements of <collection> are themselves collections,
then <item> can match the structure of the elements. (We
saw something similar with list comprehensions and with
ordinary assignments.)
for (x, y) in [(a,1), (b,2), (c,3), (d,4)]:
print x
60
A. Ferrari (ISTI) Requirements Engineering 67 / 88
For Loops and range() Function
For loops and the range() function
We often want to write a loop where the variables ranges
over some sequence of numbers. The range() function
returns a list of numbers from 0 up to but not including the
number we pass to it.
range(5) returns [0,1,2,3,4]
So we can say:
for x in range(5):
print x
(There are several other forms of range() that provide
variants  of  this  functionality…)
xrange() returns an iterator that provides the same
functionality more efficiently
61
A. Ferrari (ISTI) Requirements Engineering 68 / 88
For Loops and range() Function
Abuse of the range() function
Don't use range() to iterate over a sequence solely to have
the index and elements available at the same time
Avoid:
for i in range(len(mylist)):
print i, mylist[i]
Instead:
for (i, item) in enumerate(mylist):
print i, item
This is an example of an anti-pattern in Python
For more, see:
http://www.seas.upenn.edu/~lignos/py_antipatterns.html
http://stackoverflow.com/questions/576988/python-specific-antipatterns-and-bad-
practices
A. Ferrari (ISTI) Requirements Engineering 69 / 88
List Comprehensions (remarkably cool)
List Comprehensions 1
A powerful feature of the Python language.
Generate a new list by applying a function to every member
of an original list.
Python programmers use list comprehensions extensively.
You’ll  see  many  of  them  in  real  code.
[ expression for name in list ]
64
A. Ferrari (ISTI) Requirements Engineering 70 / 88
List Comprehensions (remarkably cool)
List Comprehensions 2
>>> li = [3, 6, 2, 7]
>>> [elem*2 for elem in li]
[6, 12, 4, 14]
[ expression for name in list ]
Where expression is some calculation or operation
acting upon the variable name.
For each member of the list, the list comprehension
1. sets name equal to that member, and
2. calculates a new value using expression,
It then collects these new values into a list which is the
return value of the list comprehension.
[ expression for name in list ]
65
A. Ferrari (ISTI) Requirements Engineering 71 / 88
List Comprehensions (remarkably cool)
List Comprehensions 3
If the elements of list are other collections, then
name can be replaced by a collection of names
that  match  the  “shape”  of  the  list members.
>>> li =  [(‘a’,  1),  (‘b’,  2),  (‘c’,  7)]
>>> [ n * 3 for (x, n) in li]
[3, 6, 21]
[ expression for name in list ]
66
A. Ferrari (ISTI) Requirements Engineering 72 / 88
Filtered List Comprehensions (remarkably cool)
Filtered List Comprehension 1
Filter determines whether expression is performed
on each member of the list.
When processing each element of list, first check if
it satisfies the filter condition.
If the filter condition returns False, that element is
omitted from the list before the list comprehension
is evaluated.
[ expression for name in list if filter]
67
A. Ferrari (ISTI) Requirements Engineering 73 / 88
Filtered List Comprehensions (remarkably cool)
>>> li = [3, 6, 2, 7, 1, 9]
>>> [elem * 2 for elem in li if elem > 4]
[12, 14, 18]
Only 6, 7, and 9 satisfy the filter condition.
So, only 12, 14, and 18 are produced.
Filtered List Comprehension 2
[ expression for name in list if filter]
A. Ferrari (ISTI) Requirements Engineering 74 / 88
Functions
First line with less
indentation is considered to be
outside of the function definition.
Defining Functions
No declaration of types of arguments or result
def get_final_answer(filename):
"""Documentation String"""
line1
line2
return total_counter
...
Function definition begins with def Function name and its arguments.
‘return’  indicates  the  
value to be sent back to the caller.
Colon.
72
A. Ferrari (ISTI) Requirements Engineering 75 / 88
Function Call
Calling a Function
>>> def myfun(x, y):
return x * y
>>> myfun(3, 4)
12
73
A. Ferrari (ISTI) Requirements Engineering 76 / 88
Import
87
Import I
import somefile
Everything in somefile.py can be referred to by:
somefile.className.method(“abc”)
somefile.myFunction(34)
from somefile import *
Everything in somefile.py can be referred to by:
className.method(“abc”)
myFunction(34)
Careful! This can overwrite the definition of an existing
function or variable!
A. Ferrari (ISTI) Requirements Engineering 77 / 88
Python NLTK Basic Concepts
http://textminingonline.com/
dive-into-nltk-part-i-getting-started-with-nltk
A. Ferrari (ISTI) Requirements Engineering 78 / 88
Python NLTK Main Features
Sentence and word tokenization
POS tagging
Chunking and Named Entity Recognition (NER)
Text Classification
Many included corpora
A. Ferrari (ISTI) Requirements Engineering 79 / 88
Python NLTK - sentence tokenization (splitting)
>>> from nltk.tokenize import sent_tokenize
>>> sent_tokenize("hello this is nltk")
[’hello this is nltk’]
>>> sent_tokenize("hello. this is nltk")
[’hello.’, ’this is nltk’]
A. Ferrari (ISTI) Requirements Engineering 80 / 88
Python NLTK - word tokenization
>>> from nltk.tokenize import word_tokenize
>>> word_tokenize("this is nltk")
[’this’, ’is’, ’nltk’]
A. Ferrari (ISTI) Requirements Engineering 81 / 88
Python NLTK - word tokenization with punctuation
>>> word_tokenize("The woman’s car")
[’The’, ’woman’, "’s", ’car’]
>>> from nltk.tokenize import wordpunct_tokenize
>>> wordpunct_tokenize("The woman’s car")
[’The’, ’woman’, "’", ’s’, ’car’]
A. Ferrari (ISTI) Requirements Engineering 82 / 88
Python NLTK - stemming
>>> from nltk.stem.porter import PorterStemmer
>>> porter_stemmer = PorterStemmer()
>>> porter_stemmer.stem(’requirements’)
’requir’
A. Ferrari (ISTI) Requirements Engineering 83 / 88
Python NLTK - lemmatizer
>>> from nltk.stem import WordNetLemmatizer
>>> wordnet_lemmatizer = WordNetLemmatizer()
>>> wordnet_lemmatizer.lemmatize(’requirements’)
’requirement’
>>> wordnet_lemmatizer.lemmatize(’is’)
’is’
>>> wordnet_lemmatizer.lemmatize(’is’, pos=’v’)
’be’
A. Ferrari (ISTI) Requirements Engineering 84 / 88
Python NLTK - POS tagging
>>> t = word_tokenize("The system shall be nice")
>>> from nltk import pos_tag
>>> pos_tag(t)
[(’The’, ’DT’), (’system’, ’NN’),
(’shall’, ’MD’), (’be’, ’VB’), (’nice’, ’JJ’)]
A. Ferrari (ISTI) Requirements Engineering 85 / 88
Python NLTK - POS tagging
How can I know the meaning of the tags?
>>> import nltk
>>> nltk.help.upenn_tagset(’DT’)
DT: determiner
all an another any both del each either every half la many much nary
neither no some such that the them these this those
>>> nltk.help.upenn_tagset(’NN’)
NN: noun, common, singular or mass
common-carrier cabbage knuckle-duster Casino afghan shed thermostat
investment slide humour falloff slick wind hyena override subhumanity
machinist ...
>>> nltk.help.upenn_tagset(’NNP’)
NNP: noun, proper, singular
Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
Shannon A.K.C. Meltex Liverpool ...
A. Ferrari (ISTI) Requirements Engineering 86 / 88
Python NLTK - stopword removal
>>> from nltk.corpus import stopwords
>>> stoplist = stopwords.words(’english’)
>>> sentence = ’The system shall be usable’
>>> print [i for i in word_tokenize(sentence) if i not in stoplist]
[’The’, ’system’, ’shall’, ’usable’]
>>> print [i for i in word_tokenize(sentence.lower())
if i not in stoplist]
[’system’, ’shall’, ’usable’]
A. Ferrari (ISTI) Requirements Engineering 87 / 88
When to use What
GATE is good to annotate documents, and find patterns
Python NLTK is good for fast prototyping of analysis tools
If you have to extract information from text: use GATE
If you have to do more fine-grained analyses, and you do not need
to build a well-structured program: use Python NLTK
If you have to build a structured program: study the OpenNLP
library (Java)
A. Ferrari (ISTI) Requirements Engineering 88 / 88

More Related Content

What's hot

Research data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitlingResearch data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitling
University of Warsaw
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
Domain Specific Language Design
Domain Specific Language DesignDomain Specific Language Design
Domain Specific Language DesignMarkus Voelter
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
hs0041
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answers
nishajj
 
Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)
Ahmed Gad
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
Sebastian Ruder
 
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Department of Communication Science, University of Amsterdam
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces usingunyil96
 
What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
What's Spain's Paris? Mining Analogical Libraries from Q&A DiscussionsWhat's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
Chunyang Chen
 
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific Laguages
Markus Voelter
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
LDBC council
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Sebastian Ruder
 
My presentation for assignment3
My presentation for assignment3My presentation for assignment3
My presentation for assignment3
m5221101
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
fgodin
 
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai KerangkaStudi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai KerangkaMetilova Sitorus
 

What's hot (20)

Research data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitlingResearch data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitling
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
Domain Specific Language Design
Domain Specific Language DesignDomain Specific Language Design
Domain Specific Language Design
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answers
 
Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces using
 
What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
What's Spain's Paris? Mining Analogical Libraries from Q&A DiscussionsWhat's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
What's Spain's Paris? Mining Analogical Libraries from Q&A Discussions
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Generic Tools, Specific Laguages
Generic Tools, Specific LaguagesGeneric Tools, Specific Laguages
Generic Tools, Specific Laguages
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
My presentation for assignment3
My presentation for assignment3My presentation for assignment3
My presentation for assignment3
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai KerangkaStudi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
 

Similar to Requirements Engineering: focus on Natural Language Processing, Lecture 2

NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 
Introduction to R for Quantitative Research
Introduction to R for Quantitative ResearchIntroduction to R for Quantitative Research
Introduction to R for Quantitative Research
Parthasarathi E
 
Technical Documentation By Techies
Technical Documentation By TechiesTechnical Documentation By Techies
Technical Documentation By Techies
ppd1961
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010AAT Taiwan
 
Public Administration, Laws Requirements, Natural Language
Public Administration, Laws Requirements, Natural LanguagePublic Administration, Laws Requirements, Natural Language
Public Administration, Laws Requirements, Natural Language
ProjectLearnPAd
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
alessio_ferrari
 
PECB Webinar: Challenges in string search in Digital Computer Forensics
PECB Webinar: Challenges in string search in Digital Computer ForensicsPECB Webinar: Challenges in string search in Digital Computer Forensics
PECB Webinar: Challenges in string search in Digital Computer Forensics
PECB
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming Languages
Ishan Monga
 
ATR-TREK - TAUS Tokyo Forum 2015
ATR-TREK - TAUS Tokyo Forum 2015ATR-TREK - TAUS Tokyo Forum 2015
ATR-TREK - TAUS Tokyo Forum 2015
TAUS - The Language Data Network
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptxCobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
mehrankhan7842312
 
An improved fuzzy system for representing web pages in Clustering Tasks
An improved fuzzy system for representing web pages in Clustering TasksAn improved fuzzy system for representing web pages in Clustering Tasks
An improved fuzzy system for representing web pages in Clustering Tasks
Alberto Pérez
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
Nltk
NltkNltk
Nltk
Anirudh
 
PROGRAMMING AND LANGUAGES
PROGRAMMING AND LANGUAGES  PROGRAMMING AND LANGUAGES
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
TAUS - The Language Data Network
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
Ivan Berlocher
 
Lessons from Indic OCR Development
Lessons from Indic OCR DevelopmentLessons from Indic OCR Development
Lessons from Indic OCR Development
Nishad Thalhath
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
Andre Freitas
 

Similar to Requirements Engineering: focus on Natural Language Processing, Lecture 2 (20)

NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
Introduction to R for Quantitative Research
Introduction to R for Quantitative ResearchIntroduction to R for Quantitative Research
Introduction to R for Quantitative Research
 
Technical Documentation By Techies
Technical Documentation By TechiesTechnical Documentation By Techies
Technical Documentation By Techies
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
Public Administration, Laws Requirements, Natural Language
Public Administration, Laws Requirements, Natural LanguagePublic Administration, Laws Requirements, Natural Language
Public Administration, Laws Requirements, Natural Language
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
PECB Webinar: Challenges in string search in Digital Computer Forensics
PECB Webinar: Challenges in string search in Digital Computer ForensicsPECB Webinar: Challenges in string search in Digital Computer Forensics
PECB Webinar: Challenges in string search in Digital Computer Forensics
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming Languages
 
ATR-TREK - TAUS Tokyo Forum 2015
ATR-TREK - TAUS Tokyo Forum 2015ATR-TREK - TAUS Tokyo Forum 2015
ATR-TREK - TAUS Tokyo Forum 2015
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptxCobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
 
An improved fuzzy system for representing web pages in Clustering Tasks
An improved fuzzy system for representing web pages in Clustering TasksAn improved fuzzy system for representing web pages in Clustering Tasks
An improved fuzzy system for representing web pages in Clustering Tasks
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Nltk
NltkNltk
Nltk
 
PROGRAMMING AND LANGUAGES
PROGRAMMING AND LANGUAGES  PROGRAMMING AND LANGUAGES
PROGRAMMING AND LANGUAGES
 
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Set...
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
Lessons from Indic OCR Development
Lessons from Indic OCR DevelopmentLessons from Indic OCR Development
Lessons from Indic OCR Development
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 

More from alessio_ferrari

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studies
alessio_ferrari
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
alessio_ferrari
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
alessio_ferrari
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
alessio_ferrari
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
alessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1
alessio_ferrari
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineering
alessio_ferrari
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
alessio_ferrari
 

More from alessio_ferrari (8)

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studies
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineering
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 

Recently uploaded (20)

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 

Requirements Engineering: focus on Natural Language Processing, Lecture 2

  • 1. Requirements Engineering: Focus on Natural Language Requirements Processing Alessio Ferrari1 1ISTI-CNR, Pisa, Italy January 15, 2016 A. Ferrari (ISTI) Requirements Engineering 1 / 88 cf. https://docplayer.net/16051752-Introduction-to-programming-languages-and- techniques-xkcd-com-full-python-tutorial.html
  • 2. Objectives of the Course Objectives: lesson 1 Understand what are requirements, specifications and domain assertions Understand the problems behind requirements elicitation and SRS definition Learn how to write a decent SRS Objectives: lesson 2 Learn the basics of Natual Language Processing (NLP) Learn the relation between NLP and Requirements Learn the basics of GATE to detect NL ambiguities in requirements Learn the basics of Python NLTK (for further manipulation of NL requirements) A. Ferrari (ISTI) Requirements Engineering 2 / 88
  • 3. NLP and Requirements Engineering A. Ferrari (ISTI) Requirements Engineering 3 / 88
  • 4. Natural Language Processing Technologies enabling extraction and manipulation of information from Natural Language (NL) Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ A. Ferrari (ISTI) Requirements Engineering 4 / 88
  • 5. Natural Language Processing Language is inherently ambiguous and hard to be handled I Imagine a well-defined syntax and semantics for an input message: you can write software to process it I With NL you do not have a well defined syntax and semantics: there is always some exception Rule-based approaches are hard to build Luckily, there’s machine-learning! . . . but I :( Large amounts of data are needed for training/test/validation I :( Tagging data requires domain expertise I :( Requirements use domain-specific jargon (features based on terms might work only for a specific company) I :( Requirements structure vary from company to company (features based on syntax might need adjustments) I :( Some tasks (e.g., ambiguity) have too many negative cases (i.e., well-written), and too few positive (i.e., ambiguous) A. Ferrari (ISTI) Requirements Engineering 5 / 88
  • 6. 4 Dimensions of Natural Language Requirements Processing Discipline: requirements have to be non-ambiguous but readable, no equivalent requirement Dynamism: requirements evolve, and have to be traced and categorised Domain Knowledge: domain knowledge has to be gathered to understand requirements, proper stakeholders have to be identified Data-sets: tagged data-sets are required, requirements are private A. Ferrari (ISTI) Requirements Engineering 6 / 88
  • 7. 4-D in the Requirements Process Elicitation Analysis ValidationManagement Traceability Domain Term Identification Stakeholder Identification Domain Scoping Ambiguity Readability ComplianceCategorisation Equivalence D I S C I P L I N E DOMAIN KNOWLEDGE DYNAMISM DATA-SETS Retrieval of Internal Requirements Market Analysis A. Ferrari (ISTI) Requirements Engineering 7 / 88
  • 8. Focus on Discipline An ambiguity in a requirement might cause severe consequences along the development False negatives are NOT tolerable (i.e., I have to discover ALL potential ambiguities) It is preferable to have a certain amount of false positive cases, which the engineer can easily discard Rec = | true ambiguities items retrieved | / | true ambiguities | Prec = | true ambiguities items retrieved | / | items retrieved | Hence, a system for ambiguity detection has to provide 100% recall! Rule-based approaches are preferred However, for some tasks, rule-based approaches give really too many false positives. A. Ferrari (ISTI) Requirements Engineering 8 / 88
  • 9. What you’ll learn Rule-based approaches Basic tools that you can use to implement rule-based approaches These tools can be used as a basis if you want to move to machine learning A. Ferrari (ISTI) Requirements Engineering 9 / 88
  • 10. NLP Minimal Concepts and Terminology A. Ferrari (ISTI) Requirements Engineering 10 / 88
  • 11. NLP Minimal Concepts Every task in NLP needs to perform Text Normalization: Segmenting/Tokenizing words in the text Normalizing word formats (Stemming and Lemmatization) Segmenting/Tokenizing sentences Besides that, many tasks need Part-Of-Speech (POS) Tagging A. Ferrari (ISTI) Requirements Engineering 11 / 88
  • 12. NLP Minimal Concepts - Tokenization (or Word Segmentation) A token is an occurrence of a WORD in a stream of text Tokenization split a stream of text into a list of WORDS Dan(Jurafsky( How(many(words?( they(lay(back(on(the(San(Francisco(grass(and(looked(at(the(stars(and(their( •  Type:(an(element(of(the(vocabulary.( •  Token:(an(instance(of(that(type(in(running(text.( •  How(many?( •  15(tokens((or(14)( •  13(types((or(12)((or(11?)( A. Ferrari (ISTI) Requirements Engineering 12 / 88
  • 13. NLP Minimal Concepts - Multi-word Terms Multi-word Terms Example: “The wayside equipment shall respond within 5 seconds” wayside equipment is a multi-word term like San Francisco We do not discuss about multi-word terms, but keep them in mind because they are frequent in requirements A. Ferrari (ISTI) Requirements Engineering 13 / 88
  • 14. NLP Minimal Concepts - Lemmatization Dan(Jurafsky( Lemma4za4on( •  Reduce(inflecGons(or(variant(forms(to(base(form( •  am,"are,(is"→(be( •  car,"cars,"car's,(cars'(→(car" •  the"boy's"cars"are"different"colors(→(the"boy"car"be"different"color" •  LemmaGzaGon:(have(to(find(correct(dicGonary(headword(form( •  Machine(translaGon( •  Spanish(quiero((‘I(want’),(quieres((‘you(want’)(same(lemma(as(querer(‘want’( A. Ferrari (ISTI) Requirements Engineering 14 / 88
  • 15. NLP Minimal Concepts - Stemming Dan(Jurafsky( Stemming( •  Reduce(terms(to(their(stems(in(informaGon(retrieval( •  Stemming(is(crude(chopping(of(affixes( •  language(dependent( •  e.g.,(automate(s),+automaGc,+automaGon(all(reduced(to(automat.( for"example"compressed"" and"compression"are"both"" accepted"as"equivalent"to"" compress.( for(exampl(compress(and( compress(ar(both(accept( as(equival(to(compress( A. Ferrari (ISTI) Requirements Engineering 15 / 88
  • 16. NLP Minimal Concepts - Sentence SegmentationDan(Jurafsky( Sentence(Segmenta4on( •  !,(?(are(relaGvely(unambiguous( •  Period(“.”(is(quite(ambiguous( •  Sentence(boundary( •  AbbreviaGons(like(Inc.(or(Dr.( •  Numbers(like(.02%(or(4.3( •  Build(a(binary(classifier( •  Looks(at(a(“.”( •  Decides(EndOfSentence/NotEndOfSentence( •  Classifiers:(handkwri@en(rules,(regular(expressions,(or(machineklearning( A. Ferrari (ISTI) Requirements Engineering 16 / 88
  • 17. NLP Minimal Concepts - POS Tagging Open class (lexical) words Closed class (functional) Nouns Verbs Proper Common Modals Main Adjectives Adverbs Prepositions Particles Determiners Conjunctions Pronouns … more … more IBM Italy cat / cats snow see registered can had old older oldest slowly to with off up the some and or he its Numbers 122,312 one Interjections Ow Eh A. Ferrari (ISTI) Requirements Engineering 17 / 88
  • 18. NLP Minimal Concepts - POS TaggingChristopher"Manning" POS-Tagging- •  Words"oYen"have"more"than"one"POS:"back% •  The"back"door"="JJ" •  On"my"back"="NN" •  Win"the"voters"back"="RB" •  Promised"to"back"the"bill"="VB" •  The"POS"tagging"problem"is"to"determine"the"POS"tag"for"a" par1cular"instance"of"a"word." https://www.ling.upenn.edu/courses/Fall_2003/ ling001/penn_treebank_pos.html A. Ferrari (ISTI) Requirements Engineering 18 / 88
  • 19. GATE Basic Concepts A. Ferrari (ISTI) Requirements Engineering 19 / 88
  • 20. IR vs IE Information Retrieval (IR): pulls documents from large corpora Information Extraction (IE): retrieves facts and structured information from large corpora IR returns documents containing the relevant information somewhere (normally fast) IE returns precise and structured information (can be slow) GATE (General Architecture for Text Engineering) supports IE Potential Usage Entity, Events, Relation extraction Enrich features for retrieval Annotate documents for ML Ambiguity identification with rule-based approaches A. Ferrari (ISTI) Requirements Engineering 20 / 88
  • 21. Document, Corpus, Annotation, Feature Document = text + annotations + features Corpus = collection of documents Linguistic information in documents is encoded in the form of annotations (like colored mark-ups) Annotations have features with relative types and values Example Annotation: Sentence Length Feature 1: Length in Characters, Value 1 = 100 Feature 2: Length in Token, Value 2 = 15 A. Ferrari (ISTI) Requirements Engineering 21 / 88
  • 22. Document, Corpus, Annotation, Feature A. Ferrari (ISTI) Requirements Engineering 22 / 88
  • 23. Processing Resources (PR) PR are algorithms that make NLP easy Document Reset PR - delete Annotations ANNIE English Tokeniser - identify tokens (words, numbers, etc.) ANNIE Sentence Splitter - identify sentence boundaries ANNIE Gazetteer - identify specific tokens in a list ANNIE POS Tagger - identify part-of-speech (POS), like name, adjective, etc. JAPE Transducers - user-defined annotations based on regular expressions over annotations ANNIE collects all the algorithms and run them in PIPELINE A. Ferrari (ISTI) Requirements Engineering 23 / 88
  • 24. Using Processing Resources in GATE Open GATE Language Resources > New > GATE Document Load Source Document: exercises/requirements.txt Right-click> New Corpus from this document Right-click>Processing Resources (then select the resources reported in the next slides) A. Ferrari (ISTI) Requirements Engineering 24 / 88
  • 25. ANNIE English Tokenizer Produces Token annotations A. Ferrari (ISTI) Requirements Engineering 25 / 88
  • 26. ANNIE Sentence Splitter Produces Sentence annotations A. Ferrari (ISTI) Requirements Engineering 26 / 88
  • 27. Gazetteer Produces Lookup annotations A. Ferrari (ISTI) Requirements Engineering 27 / 88
  • 28. ANNIE POS Tagger Modifies the Token annotations with a new feature category = NN, JJ, VB, etc. A. Ferrari (ISTI) Requirements Engineering 28 / 88
  • 29. JAPE Transducers Gazetteer lists are designed for annotating simple, regular features Even identifying simple patterns like e-mails is impossible with a Gazetteer What is JAPE? JAPE provides pattern matching in GATE Each JAPE rule consists of: I LHS which contains patterns to match I RHS which details the annotations (and optionally features) to be created A. Ferrari (ISTI) Requirements Engineering 29 / 88
  • 30. Example I want to find all the occurrences of the term “level” followed by a number (level 1, level 2, etc.) A. Ferrari (ISTI) Requirements Engineering 30 / 88
  • 31. Example (continue) Making it case-insensitive A. Ferrari (ISTI) Requirements Engineering 31 / 88
  • 32. Example (continue) Adding features and values A. Ferrari (ISTI) Requirements Engineering 32 / 88
  • 33. GATE for Ambiguity Detection A. Ferrari (ISTI) Requirements Engineering 33 / 88
  • 34. Steps - Detecting Vagueness Open GATE Language Resources > New > GATE Document Load Source Document: exercises/requirements.txt Right-click> New Corpus from this document Copy file exercises/vagueness.lst to /GATE_Developer_8.0/plugins/ANNIE/resources/gazetteer Open file lists.def (here you have all the file pointers to the lists checked by the Look-up) Add the line: vagueness.lst:vague_term A. Ferrari (ISTI) Requirements Engineering 34 / 88
  • 35. Steps - Detecting Vagueness (continue) Go back to GATE Processing Resources>New>ANNIE Gazetteer, and give it a name Check: case sensitive = false Click on the Gazetteer, and check the presence of vagueness.lst Applications>ANNIE Move the Gazetteer to the right panel Run! A. Ferrari (ISTI) Requirements Engineering 35 / 88
  • 36. Steps - Annotating Vagueness Processing Resources>New>Jape Transducer Load annotate_vagueness.jape Applications>ANNIE Move the JAPE rule to the right panel Run! A. Ferrari (ISTI) Requirements Engineering 36 / 88
  • 37. Coordination Ambiguities Detect all sentences that include a potential coordination ambiguity Coordination ambiguities are of two types: I Example: The system shall produce the plot or the data-log and the legend. I Example: New employees and directors shall be provided with a user-name and password. A. Ferrari (ISTI) Requirements Engineering 37 / 88
  • 38. Coordination Ambiguities - Jape Rule 0 Retrieve all occurrences of And / Or Phase: MatchAndOr Input: Token Options: control = appelt //Operator ==~ returns only whole string matches //Operator (?i) tells that the matching is case insensitive //Operator | is the logical "or" operator Rule: checkAndOr Priority: 1 ( {Token.string ==~ "(?i)and"} | {Token.string ==~ "(?i)or"} ):coord --> :coord.AndOr = {} A. Ferrari (ISTI) Requirements Engineering 38 / 88
  • 39. Coordination Ambiguities - Jape Rule 1 Annotate all the sequences of And/Or in the same sentence Phase: MatchCoordinationSequences Input: Split AndOr Options: control = appelt //Note that, having Split among the input Annotations allows //us to identify sequences of And/Or in the same sentence //The ’+’ operator means "one or more occurrences" //The ’*’ operator means "zero or more occurrences" //The ’?’ operator means "zero or one occurrences" Rule: checkCoordinationSequences Priority: 1 ( {AndOr} ({AndOr})+ ):coordSequence --> :coordSequence.AndOrSequence = {} A. Ferrari (ISTI) Requirements Engineering 39 / 88
  • 40. Coordination Ambiguities - Jape Rule 2 Annotate all the sentences with And/Or sequences Phase: MatchCoordinationSentences Input: Sentence AndOrSequence Options: control = appelt //A "contains" B: searches for //annotations A that contain annotations B Rule: checkCoordinationSentences Priority: 1 ( {Sentence contains AndOrSequence} ):coordSentence --> :coordSentence.AndOrSentence = {} A. Ferrari (ISTI) Requirements Engineering 40 / 88
  • 41. Your Turn! - Detect Coordination Ambiguities of the Second Type Add the Example: new employees and directors shall be provided with a user-name and password Remember that ANNIE POS Tagger gives you the category of the Token JJ = Adjective, NN = Noun, NNS = Plural Noun More information on the tag-set, and many other things, can be found in: https://gate.ac.uk/sale/tao/tao.pdf A. Ferrari (ISTI) Requirements Engineering 41 / 88
  • 42. Python and NLTK A. Ferrari (ISTI) Requirements Engineering 42 / 88
  • 43. Python Basic Concepts Python Online! http://www.learnpython.org/ Tutorial: https://docs.python.org/2/tutorial/ A. Ferrari (ISTI) Requirements Engineering 43 / 88
  • 44. Python - Main Features Interpreted language Dynamically typed (variables do not have a predefined type!) Rich, built-in collection types I Lists I Tuples I Dictionaries I Sets Concise My Humble Opinion It is perfect if you want to develop something quickly It is a mess if you want to develop a structured software A. Ferrari (ISTI) Requirements Engineering 44 / 88
  • 45. Which Python? Python 2.7: that’s the one we’ll use here I Last stable release before version 3 I Implements parts of the version 3 features I Fully backwards compatible Python 3.0: that’s the one you should use in the future I Released on Dec. 3, 2008 I Many changes (also incompatible ones) I . . . But: few important third-party libraries are not yet compatible with Python 3.0 I Version compatible with NLTK 3.0: either 2.7, or 3.2+ Hint: if you want to use NLTK, go to: http://www.nltk.org/install.html, and check the links to the additional libraries and python recommended releases, before downloading Python A. Ferrari (ISTI) Requirements Engineering 45 / 88
  • 46. Python - Example A Code Sample (in IDLE) x = 34 - 23 # A comment. y = “Hello” # Another one. z = 3.45 if z == 3.45 or y == “Hello”: x = x + 1 y = y + “ World” # String concat. print x print y A. Ferrari (ISTI) Requirements Engineering 46 / 88
  • 47. Python - Enough to Understand the Code Indentation matters to the meaning of the code The first assignment to a variable creates it (x = 3) I Variable types do not need to be declared I Python figures out the variable types on its own Logical operators are words (and, or, not), NOT symbols Simple printing can be done with print Use a newline to end a line of code Use when must go to next line prematurely A. Ferrari (ISTI) Requirements Engineering 47 / 88
  • 48. Python - Assignment Binding a variable in Python means setting a name to hold a reference to some object Hence, assignment creates references, not copies (like in Java) An object is deleted (by the garbage collector) once it becomes unreachable A. Ferrari (ISTI) Requirements Engineering 48 / 88
  • 49. Python - Language Features Indentation instead of braces Several sequence types I Strings (immutable): ’This is a string’, and “Also this is a string” I Tuples (immutable): (’the’, ’system’, ’shall’, ’measure’, ’the’, ’heartbeat’) I Lists (mutable): [’the’, ’system’, ’shall’, ’measure’, ’the’, ’heartbeat’] Lists and Tuples can contain anything: I ((’REQ’, 1), ’the system shall measure the heartbeat’) I [((’REQ’, 1), ’the system shall measure the heartbeat’), ((’REQ’, 2), ’the system shall alert the nurse’))] A. Ferrari (ISTI) Requirements Engineering 49 / 88
  • 50. Python - Sequence Types Tuples are defined using parentheses (and commas). tu = (23, ‘abc’, 4.56, (2,3), ‘def’) Lists are defined using square brackets (and commas) li = [“abc”, 34, 4.34, 23] Strings are defined using quotes (“, ‘, or “““) st = “Hello World” st = ‘Hello World’ st = “““Hello World ””” A. Ferrari (ISTI) Requirements Engineering 50 / 88
  • 51. Python - Sequence Types We can access individual members of a tuple, list, or string using square bracket “array” notation. We can access individual members of a tuple, lis using  square  bracket  “array”  notation.   Note  that  all  are  0  based…   >>> tu = (23, ‘abc’, 4.56, (2,3), ‘def’) >>> tu[1] # Second item in the tuple. ‘abc’ >>> li = [“abc”, 34, 4.34, 23] >>> li[1] # Second item in the list. 34 >>> st = “Hello  World” >>> st[1] # Second character in string. ‘e’ A. Ferrari (ISTI) Requirements Engineering 51 / 88
  • 52. Python - Negative IndicesNegative indices >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’) Positive index: count from the left, starting with 0. >>> t[1] ‘abc’ Negative lookup: count from right, starting with –1. >>> t[-3] 4.56 A. Ferrari (ISTI) Requirements Engineering 52 / 88
  • 53. Python - Slicing 30 Slicing: Return Copy of a Subset 1 >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’) Return a copy of the container with a subset of the original members. Start copying at the first index, and stop copying before the second index. >>> t[1:4] (‘abc’,  4.56,  (2,3)) You can also use negative indices when slicing. >>> t[1:-1] (‘abc’,  4.56,  (2,3)) Optional argument allows selection of every nth item. >>> t[1:-1:2] (‘abc’,  (2,3)) A. Ferrari (ISTI) Requirements Engineering 53 / 88
  • 54. Python - ‘in’ OperatorThe  ‘in’  Operator Boolean test whether a value is inside a collection (often called a container in Python: >>> t = [1, 2, 4, 5] >>> 3 in t False >>> 4 in t True >>> 4 not in t False For strings, tests for substrings >>> a = 'abcde' >>> 'c' in a True >>> 'cd' in a True >>> 'ac' in a False Be careful: the in keyword is also used in the syntax of for loops and list comprehensions. A. Ferrari (ISTI) Requirements Engineering 54 / 88
  • 55. Python - ‘+’ Operator 34 The + Operator The + operator produces a new tuple, list, or string whose value is the concatenation of its arguments. Extends concatenation from strings to other types >>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6) >>> [1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] >>> “Hello”  + “  ”  + “World” ‘Hello  World’ A. Ferrari (ISTI) Requirements Engineering 55 / 88
  • 56. Lists - Mutable 36 Lists: Mutable >>> li = [‘abc’, 23, 4.34, 23] >>> li[1] = 45 >>> li [‘abc’,  45,  4.34,  23] We can change lists in place. Name li still points to the same memory reference when we’re  done.   A. Ferrari (ISTI) Requirements Engineering 56 / 88
  • 57. Tuples - Immutable 37 Tuples: Immutable >>> t = (23, ‘abc’, 4.56, (2,3), ‘def’) >>> t[2] = 3.14 Traceback (most recent call last): File "<pyshell#75>", line 1, in -toplevel- tu[2] = 3.14 TypeError: object doesn't support item assignment You  can’t  change  a  tuple.   You can make a fresh tuple and assign its reference to a previously used name. >>> t = (23, ‘abc’, 3.14, (2,3), ‘def’) The  immutability  of  tuples  means  they’re  faster  than  lists.   A. Ferrari (ISTI) Requirements Engineering 57 / 88
  • 58. Operations on Lists 38 Operations on Lists Only 1 >>> li = [1, 11, 3, 4, 5] >>> li.append(‘a’) # Note the method syntax >>> li [1,  11,  3,  4,  5,  ‘a’] >>> li.insert(2,  ‘i’) >>>li [1,  11,  ‘i’,  3,  4,  5,  ‘a’] A. Ferrari (ISTI) Requirements Engineering 58 / 88
  • 59. Dictionaries Dictionaries store a mapping between a set of keys and a set of values Keys can be any immutable type Values can be any type A. Ferrari (ISTI) Requirements Engineering 59 / 88
  • 60. Dictionaries - Creating and AccessingCreating and accessing dictionaries >>> d = {‘user’:‘bozo’, ‘pswd’:1234} >>> d[‘user’] ‘bozo’ >>> d[‘pswd’] 1234 >>> d[‘bozo’] Traceback (innermost last): File  ‘<interactive  input>’  line  1,  in  ? KeyError: bozo A. Ferrari (ISTI) Requirements Engineering 60 / 88
  • 61. Dictionaries - Updating Updating Dictionaries >>> d = {‘user’:‘bozo’, ‘pswd’:1234} >>> d[‘user’] = ‘clown’ >>> d {‘user’:‘clown’,  ‘pswd’:1234} Keys must be unique. Assigning to an existing key replaces its value. >>> d[‘id’] = 45 >>> d {‘user’:‘clown’,  ‘id’:45,  ‘pswd’:1234} Dictionaries are unordered New entry might appear anywhere in the output. (Dictionaries work by hashing) A. Ferrari (ISTI) Requirements Engineering 61 / 88
  • 62. Dictionaries - Removing Removing dictionary entries >>> d = {‘user’:‘bozo’, ‘p’:1234, ‘i’:34} >>> del d[‘user’] # Remove one. Note that del is # a function. >>> d {‘p’:1234,  ‘i’:34} >>> d.clear() # Remove all. >>> d {} >>> a=[1,2] >>> del a[1] # (del also works on lists) >>> a [1] 47 A. Ferrari (ISTI) Requirements Engineering 62 / 88
  • 63. Dictionaries - Useful Methods Useful Accessor Methods >>> d = {‘user’:‘bozo’, ‘p’:1234, ‘i’:34} >>> d.keys() # List of current keys [‘user’,  ‘p’,  ‘i’] >>> d.values() # List of current values. [‘bozo’,  1234,  34] >>> d.items() # List of item tuples. [(‘user’,‘bozo’),  (‘p’,1234),  (‘i’,34)] A. Ferrari (ISTI) Requirements Engineering 63 / 88
  • 64. If Statements (as expected) if Statements (as expected) if x == 3: print "X equals 3." elif x == 2: print "X equals 2." else: print "X equals something else." print "This  is  outside  the  ‘if’." Note: Use of indentation for blocks Colon (:) after boolean expression 54 A. Ferrari (ISTI) Requirements Engineering 64 / 88
  • 65. While Loops (as expected) while Loops (as expected) >>> x = 3 >>> while x < 5: print x, "still in the loop" x = x + 1 3 still in the loop 4 still in the loop >>> x = 6 >>> while x < 5: print x, "still in the loop" >>> 55 A. Ferrari (ISTI) Requirements Engineering 65 / 88
  • 66. For Loops (very cool)For Loops 1 For-each  is  Python’s  only form of for loop A for loop steps through each of the items in a collection type,  or  any  other  type  of  object  which  is  “iterable” for <item> in <collection>: <statements> If <collection> is a list or a tuple, then the loop steps through each element of the sequence. If <collection> is a string, then the loop steps through each character of the string. for someChar in “Hello  World”: print someChar 59 A. Ferrari (ISTI) Requirements Engineering 66 / 88
  • 67. For Loops Items can be more complex than a simple variable name <statements> <item> can be more complex than a single variable name. If the elements of <collection> are themselves collections, then <item> can match the structure of the elements. (We saw something similar with list comprehensions and with ordinary assignments.) for (x, y) in [(a,1), (b,2), (c,3), (d,4)]: print x 60 A. Ferrari (ISTI) Requirements Engineering 67 / 88
  • 68. For Loops and range() Function For loops and the range() function We often want to write a loop where the variables ranges over some sequence of numbers. The range() function returns a list of numbers from 0 up to but not including the number we pass to it. range(5) returns [0,1,2,3,4] So we can say: for x in range(5): print x (There are several other forms of range() that provide variants  of  this  functionality…) xrange() returns an iterator that provides the same functionality more efficiently 61 A. Ferrari (ISTI) Requirements Engineering 68 / 88
  • 69. For Loops and range() Function Abuse of the range() function Don't use range() to iterate over a sequence solely to have the index and elements available at the same time Avoid: for i in range(len(mylist)): print i, mylist[i] Instead: for (i, item) in enumerate(mylist): print i, item This is an example of an anti-pattern in Python For more, see: http://www.seas.upenn.edu/~lignos/py_antipatterns.html http://stackoverflow.com/questions/576988/python-specific-antipatterns-and-bad- practices A. Ferrari (ISTI) Requirements Engineering 69 / 88
  • 70. List Comprehensions (remarkably cool) List Comprehensions 1 A powerful feature of the Python language. Generate a new list by applying a function to every member of an original list. Python programmers use list comprehensions extensively. You’ll  see  many  of  them  in  real  code. [ expression for name in list ] 64 A. Ferrari (ISTI) Requirements Engineering 70 / 88
  • 71. List Comprehensions (remarkably cool) List Comprehensions 2 >>> li = [3, 6, 2, 7] >>> [elem*2 for elem in li] [6, 12, 4, 14] [ expression for name in list ] Where expression is some calculation or operation acting upon the variable name. For each member of the list, the list comprehension 1. sets name equal to that member, and 2. calculates a new value using expression, It then collects these new values into a list which is the return value of the list comprehension. [ expression for name in list ] 65 A. Ferrari (ISTI) Requirements Engineering 71 / 88
  • 72. List Comprehensions (remarkably cool) List Comprehensions 3 If the elements of list are other collections, then name can be replaced by a collection of names that  match  the  “shape”  of  the  list members. >>> li =  [(‘a’,  1),  (‘b’,  2),  (‘c’,  7)] >>> [ n * 3 for (x, n) in li] [3, 6, 21] [ expression for name in list ] 66 A. Ferrari (ISTI) Requirements Engineering 72 / 88
  • 73. Filtered List Comprehensions (remarkably cool) Filtered List Comprehension 1 Filter determines whether expression is performed on each member of the list. When processing each element of list, first check if it satisfies the filter condition. If the filter condition returns False, that element is omitted from the list before the list comprehension is evaluated. [ expression for name in list if filter] 67 A. Ferrari (ISTI) Requirements Engineering 73 / 88
  • 74. Filtered List Comprehensions (remarkably cool) >>> li = [3, 6, 2, 7, 1, 9] >>> [elem * 2 for elem in li if elem > 4] [12, 14, 18] Only 6, 7, and 9 satisfy the filter condition. So, only 12, 14, and 18 are produced. Filtered List Comprehension 2 [ expression for name in list if filter] A. Ferrari (ISTI) Requirements Engineering 74 / 88
  • 75. Functions First line with less indentation is considered to be outside of the function definition. Defining Functions No declaration of types of arguments or result def get_final_answer(filename): """Documentation String""" line1 line2 return total_counter ... Function definition begins with def Function name and its arguments. ‘return’  indicates  the   value to be sent back to the caller. Colon. 72 A. Ferrari (ISTI) Requirements Engineering 75 / 88
  • 76. Function Call Calling a Function >>> def myfun(x, y): return x * y >>> myfun(3, 4) 12 73 A. Ferrari (ISTI) Requirements Engineering 76 / 88
  • 77. Import 87 Import I import somefile Everything in somefile.py can be referred to by: somefile.className.method(“abc”) somefile.myFunction(34) from somefile import * Everything in somefile.py can be referred to by: className.method(“abc”) myFunction(34) Careful! This can overwrite the definition of an existing function or variable! A. Ferrari (ISTI) Requirements Engineering 77 / 88
  • 78. Python NLTK Basic Concepts http://textminingonline.com/ dive-into-nltk-part-i-getting-started-with-nltk A. Ferrari (ISTI) Requirements Engineering 78 / 88
  • 79. Python NLTK Main Features Sentence and word tokenization POS tagging Chunking and Named Entity Recognition (NER) Text Classification Many included corpora A. Ferrari (ISTI) Requirements Engineering 79 / 88
  • 80. Python NLTK - sentence tokenization (splitting) >>> from nltk.tokenize import sent_tokenize >>> sent_tokenize("hello this is nltk") [’hello this is nltk’] >>> sent_tokenize("hello. this is nltk") [’hello.’, ’this is nltk’] A. Ferrari (ISTI) Requirements Engineering 80 / 88
  • 81. Python NLTK - word tokenization >>> from nltk.tokenize import word_tokenize >>> word_tokenize("this is nltk") [’this’, ’is’, ’nltk’] A. Ferrari (ISTI) Requirements Engineering 81 / 88
  • 82. Python NLTK - word tokenization with punctuation >>> word_tokenize("The woman’s car") [’The’, ’woman’, "’s", ’car’] >>> from nltk.tokenize import wordpunct_tokenize >>> wordpunct_tokenize("The woman’s car") [’The’, ’woman’, "’", ’s’, ’car’] A. Ferrari (ISTI) Requirements Engineering 82 / 88
  • 83. Python NLTK - stemming >>> from nltk.stem.porter import PorterStemmer >>> porter_stemmer = PorterStemmer() >>> porter_stemmer.stem(’requirements’) ’requir’ A. Ferrari (ISTI) Requirements Engineering 83 / 88
  • 84. Python NLTK - lemmatizer >>> from nltk.stem import WordNetLemmatizer >>> wordnet_lemmatizer = WordNetLemmatizer() >>> wordnet_lemmatizer.lemmatize(’requirements’) ’requirement’ >>> wordnet_lemmatizer.lemmatize(’is’) ’is’ >>> wordnet_lemmatizer.lemmatize(’is’, pos=’v’) ’be’ A. Ferrari (ISTI) Requirements Engineering 84 / 88
  • 85. Python NLTK - POS tagging >>> t = word_tokenize("The system shall be nice") >>> from nltk import pos_tag >>> pos_tag(t) [(’The’, ’DT’), (’system’, ’NN’), (’shall’, ’MD’), (’be’, ’VB’), (’nice’, ’JJ’)] A. Ferrari (ISTI) Requirements Engineering 85 / 88
  • 86. Python NLTK - POS tagging How can I know the meaning of the tags? >>> import nltk >>> nltk.help.upenn_tagset(’DT’) DT: determiner all an another any both del each either every half la many much nary neither no some such that the them these this those >>> nltk.help.upenn_tagset(’NN’) NN: noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed thermostat investment slide humour falloff slick wind hyena override subhumanity machinist ... >>> nltk.help.upenn_tagset(’NNP’) NNP: noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA Shannon A.K.C. Meltex Liverpool ... A. Ferrari (ISTI) Requirements Engineering 86 / 88
  • 87. Python NLTK - stopword removal >>> from nltk.corpus import stopwords >>> stoplist = stopwords.words(’english’) >>> sentence = ’The system shall be usable’ >>> print [i for i in word_tokenize(sentence) if i not in stoplist] [’The’, ’system’, ’shall’, ’usable’] >>> print [i for i in word_tokenize(sentence.lower()) if i not in stoplist] [’system’, ’shall’, ’usable’] A. Ferrari (ISTI) Requirements Engineering 87 / 88
  • 88. When to use What GATE is good to annotate documents, and find patterns Python NLTK is good for fast prototyping of analysis tools If you have to extract information from text: use GATE If you have to do more fine-grained analyses, and you do not need to build a well-structured program: use Python NLTK If you have to build a structured program: study the OpenNLP library (Java) A. Ferrari (ISTI) Requirements Engineering 88 / 88