Natural Language Processing
www.vkedco.blogspot.comwww.vkedco.blogspot.com
NLP Areas, Chomsky Hierarchy, Finite Sate Machi...
Outline
●
NLP Areas
●
Chomsky Hierarchy
●
Finite State Automata, Regular Expressions, & Regular
Languages
●
ELIZA: Natural...
NLP Areas
●
Morphology
●
Phonology and Text-To-Speech (TTS)
●
Syntactic Analysis
●
Semantics
●
Optical Character Recogniti...
Chomsky Language Hierarchy
Where Should Natural Languages Be Placed?
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Finite State Automata, Regular
Expressions, & Regular Languages
www.vkedco.blogspot.comwww.vkedco.blogspot.com
DFA: Deterministic Finite Automata
• A DFA can be informally defined as a directed graph
whose nodes are states and whose ...
DFA: Formal Definition
( )
states.(final)acceptingofsettheis
state;starttheis
;:function;ntransitioais
alphabet;anis
state...
Example DFA
q0 q1
b
a
a
b
All strings that end in a
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Nondeterminism
●
Given an input, there can be more than one
legal sequence of steps to process the input
●
There can be va...
Practical Implication of Nondeterminism
●
Key computational implication of
nondeterminism is the necessity of search
●
In ...
NFA: Definition
( )
{ }( ) ( )
states.acceptingofsettheis
state;starttheis
;:
symbols;ofsetfiniteai.e.alphabet,anis
states...
Example NFA
q0 q1 q2
a,b
a a
a,b
a b
q0 {q0, q1} {q0}
q1 {q2} { }
q2 {q2} {q2}
www.vkedco.blogspot.comwww.vkedco.blogspot....
NFA vs. DFA
●
NFAs are simpler to write, because, in general,
have fewer states and allow for spontaneous
transitions
●
Ho...
Equivalence of NFAs and DFAs
●
Basic insight: A DFA can keep track of the states
that the equivalent NFA may be in after r...
Languages Accepted By DFAs & NFAs
●
An immediate consequence of the subset
construction algorithm is that non-determinism
...
Regular Expressions
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Regular Expressions
●
Regular expressions (regexps) are one of the
most useful tools in computer science
●
NLP, as an area...
Three Language Operations
{ }
{ }
{ }.,1|...isthenlanguage,aisIf
.,0|...isofclosureKleenethelanguage,aisIf
.and|languages,...
Atomic & Compound Regular Expressions
●
Regular expressions can be divided into atomic
and compound
●
Atomic regular expre...
Atomic Regular Expressions
( )
( ) { }
( ) { }
( ) { }=∅
=
=Σ∈
L
L
aaLa
ΣrL
r
.3
.2
then,If1.
:sexpressionregularatomicoft...
Compound Regular Expressions
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )( ) ( )( ) .Then.expressionregularais.3
.Then.expressi...
Compiling FSAs from RegExps
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Atomic Reg Exps NFAs→
Σ∈a
a
This NFA accepts only the string 'a' and nothing else
www.vkedco.blogspot.comwww.vkedco.blogsp...
Atomic Reg Exps NFAs→
ε
ε
This NFA accepts only the empty string
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Atomic Reg Exps NFAs→
∅
This NFA accepts only the empty set, i.e.,
no strings
www.vkedco.blogspot.comwww.vkedco.blogspot.c...
Compound Reg Exps NFAs→
( ) sexpressionregularare,where, 2121 rrrr +
ε
r1
r2
Another notation, commonly used in regexp eng...
Compound Reg Exps NFAs→
ε
r1
r2
This compound NFA accepts if and only if either the NFA for r1
(upper one) accepts or the ...
Compound Reg Exps NFAs→
( ) expressionregulariswhere, 11 rr
+
ε
ε
εr1
This regular expressions accepts strings that match ...
( )*
1r
ε
ε
εr1
ε
Compound Reg Exps NFAs→
This regular expressions accepts strings that match r1
zero or more times
www.vk...
Defining Regular Expressions
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Just in case you are interested/curious, below...
Defining Regexps
/a/  ­  match  all  occurrences  of  character 
'a'
/ab/  ­  match  all  occurrences  of  sequence 
'ab'
...
Disjunction
/[Aa]/ ­ match either 'A' or 'a'
/]abc]/ ­ match all occurrences of sequence 
'a' or 'b' or 'c'
/[cC]onference...
Positive & Negative Ranges
/[A-Z]/ - match an uppercase letter
/]a-z]/ - match a lowercase letter
/[0-9]/ - match a digit
...
Zero or One Occurrences
/Examples?/ - match 'Example' or 'Examples'
/colou?r/ - match 'colour' or 'color'
●
Special charac...
Kleene*, Kleene+, Wildcard .
/a*/ ­ match '', 'a', 'aa', 'aaa', 'aaaa', etc
/(ab)*/  ­  match  '',  'ab',  'abab',  'ababa...
Anchors
/^Omar Khayyam/ matches 'Omar Khayyam' only 
at the beginning of the text
/Omar Khayyam$/ matches 'Omar Khayyam' o...
Anchors
/btheb/ matches ' the ' but not 'weather'
/BtheB/ does not match 'the' but does match 'weather'
●
Anchors b and B ...
Regexp Groups
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Grouping Regular Expressions
●
Regular expressions can be broken into compo-
nents
●
Such components are called groups
●
A...
Referencing Group Matches
●
Group matches are numbered $1, $2, $3, etc
(some regexp notations use just numbers: 1, 2, 3,
e...
Sample Problem
Design a regular expression that parses email addresses
into user name, host name, and host extension
www.v...
Possible Solution
'([w.­]+)@([w.­]+).(com|net|org)'
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Example 01
$txt_01 = '1+1=2';
## Suppose we match /(d)+(d)=(d)/ against $txt_01
## Then the variable alignment is as follo...
Example 02
$txt_01 = '1+1=2';
## Suppose we match /((d)+(d)=(d))/ against $txt_01.
## Then the special variable alignment ...
Substitutions
●
Groups (subexpressions of larger regular expres-
sions) are specified with special characters ( )
●
Corres...
Patterns in Java
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Patterns in Java
●
Java has the package java.util.regex that allows the
programmer to work with patterns, including regula...
Java Pattern Example
●
This is how you compile a regular expression:
– Pattern pat = Pattern.compile(“abc”);
●
This is how...
Java Pattern Example
while (true) {
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
Matcher m...
Java Pattern Example
●
This is how you can call RegexDemo from the command
line:
– java RegexTestHarness ^(0|1(01*0)*1)*$
...
Sample Outputs
Enter your regex: a
Enter input string to search: bac
I found the text "a" starting at index 1 and ending a...
Pattern Matching
Real Intelligence
or
Illusion of Intelligence?
www.vkedco.blogspot.comwww.vkedco.blogspot.com
Illusion of Intelligence?
●
In the 1960's and 70's, AI researchers developed a num-
ber of programs that impressed many pe...
Pattern Matching
●
What all these systems had in common was a technique
that in symbolic AI came to be known as pattern ma...
ELIZA
●
ELIZA was written by Joseph Weizenbaum in 1966
●
The system was named after the heroine of the play
Pygmalion whom...
Sample Dialog with ELIZA
USER> Men are all alike
ELIZA> IN WHAT WAY
USER> They are always bugging us about something or an...
How ELIZA Works
●
The program looks for specific patterns in the input and then prints the response
on the basis of what i...
ELIZA RULES
●
A rule is a data structure that consists of a pattern and a set of re-
sponses
●
Example:
RULE
PATTERN: 'X I...
Rule Application
●
Suppose the user types: Some day I want to read Farid Uddin
Attar's Conference of Birds in the original...
Rule Application
●
After the list of bindings is found, the program can then use it to
produce the following responses:
– ...
ELIZA Algorithm
while ( True ) {
input = get_input_from_user();
applicable_rules = find_applicable_rules(input, rule_datab...
References
●
Ch 02, D. Jurafsky & J. Martin. Speech & Language Processing,
Prentice Hall, ISBN 0-13-095069-6
●
Weizenbaum,...
Upcoming SlideShare
Loading in …5
×

NLP: NLP Areas, Chomisky Hierarchy, Finite State Automata, Regular Languages, Regular Expressions, Pattern Matching, ELIZA - NL Dialogue with Computer

747 views
661 views

Published on

NLP: NLP Areas, Chomisky Hierarchy, Finite State Automata, Regular Languages, Regular Expressions, Pattern Matching, ELIZA - NL Dialogue with Computer

Published in: Science, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
747
On SlideShare
0
From Embeds
0
Number of Embeds
195
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NLP: NLP Areas, Chomisky Hierarchy, Finite State Automata, Regular Languages, Regular Expressions, Pattern Matching, ELIZA - NL Dialogue with Computer

  1. 1. Natural Language Processing www.vkedco.blogspot.comwww.vkedco.blogspot.com NLP Areas, Chomsky Hierarchy, Finite Sate Machines & Regular Expressions, ELIZA: NL Dialogue through Pattern Matching Vladimir Kulyukin
  2. 2. Outline ● NLP Areas ● Chomsky Hierarchy ● Finite State Automata, Regular Expressions, & Regular Languages ● ELIZA: Natural Language Dialogue through Pattern Matching www.vkedco.blogspot.comwww.vkedco.blogspot.com
  3. 3. NLP Areas ● Morphology ● Phonology and Text-To-Speech (TTS) ● Syntactic Analysis ● Semantics ● Optical Character Recognition (OCR) – this is somewhat questionable, because to may it is an area of computer vision ● Speech Recognition ● Natural Language Generation (NLG) www.vkedco.blogspot.comwww.vkedco.blogspot.com
  4. 4. Chomsky Language Hierarchy Where Should Natural Languages Be Placed? www.vkedco.blogspot.comwww.vkedco.blogspot.com
  5. 5. Finite State Automata, Regular Expressions, & Regular Languages www.vkedco.blogspot.comwww.vkedco.blogspot.com
  6. 6. DFA: Deterministic Finite Automata • A DFA can be informally defined as a directed graph whose nodes are states and whose edges are transitions on specific symbols • A DFA has a unique start state and a set (possibly empty) of final or accepting states • A DFA processes the input string one symbol at a time; when the last symbol is read, the DFA reaches a state which is either final or not; if the state is final, the DFA accepts (recognizes) the string; if the state is not final, the DFA rejects the string www.vkedco.blogspot.comwww.vkedco.blogspot.com
  7. 7. DFA: Formal Definition ( ) states.(final)acceptingofsettheis state;starttheis ;:function;ntransitioais alphabet;anis states;ofsetfiniteais :where ,,,,,i.e.tuple,-5aisDFAA 0 0 F Qq QQ Q FqQMM ∈ →Σ× Σ Σ= δδ δ www.vkedco.blogspot.comwww.vkedco.blogspot.com
  8. 8. Example DFA q0 q1 b a a b All strings that end in a www.vkedco.blogspot.comwww.vkedco.blogspot.com
  9. 9. Nondeterminism ● Given an input, there can be more than one legal sequence of steps to process the input ● There can be various criteria to evaluate why one legal sequence is better than another ● The input is accepted if at least one legal sequence of moves ends up in an accepting state www.vkedco.blogspot.comwww.vkedco.blogspot.com
  10. 10. Practical Implication of Nondeterminism ● Key computational implication of nondeterminism is the necessity of search ● In a typical scenario, a legal sequence of steps is a subset of some finite set ● Finding subsets brings us to the concept of power set www.vkedco.blogspot.comwww.vkedco.blogspot.com
  11. 11. NFA: Definition ( ) { }( ) ( ) states.acceptingofsettheis state;starttheis ;: symbols;ofsetfiniteai.e.alphabet,anis states;ofsetfiniteais where ,,,,,tuple-5aisNFAAn 0 0 QF Qq QPQ Q FqQMM ⊆ ∈ →∪Σ× Σ Σ= εδ δ www.vkedco.blogspot.comwww.vkedco.blogspot.com
  12. 12. Example NFA q0 q1 q2 a,b a a a,b a b q0 {q0, q1} {q0} q1 {q2} { } q2 {q2} {q2} www.vkedco.blogspot.comwww.vkedco.blogspot.com This is the transition table of the above NFA
  13. 13. NFA vs. DFA ● NFAs are simpler to write, because, in general, have fewer states and allow for spontaneous transitions ● However, they are not more powerful than DFAs, i.e. they accept the same regular languages as DFAs ● For every NFA, one can construct a DFA that accepts the same language www.vkedco.blogspot.comwww.vkedco.blogspot.com
  14. 14. Equivalence of NFAs and DFAs ● Basic insight: A DFA can keep track of the states that the equivalent NFA may be in after reading each symbol of the input ● Since the NFA may be in more than one state after reading a symbol, each state of the DFA must correspond to a subset of the NFA’s states ● The construction of an equivalent DFA from an NFA is called subset construction www.vkedco.blogspot.comwww.vkedco.blogspot.com
  15. 15. Languages Accepted By DFAs & NFAs ● An immediate consequence of the subset construction algorithm is that non-determinism does not increase conceptual power (i.e., the same class of languages is recognized by DFAs and NFAs) ● Languages that are recognized by FSAs (DFAs and NFAs) are called regular www.vkedco.blogspot.comwww.vkedco.blogspot.com
  16. 16. Regular Expressions www.vkedco.blogspot.comwww.vkedco.blogspot.com
  17. 17. Regular Expressions ● Regular expressions (regexps) are one of the most useful tools in computer science ● NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, informa- tion extraction, & speech recognition ● As a student of NLP, you should learn to recog- nize if a problem at hand can be solved via reg- exps www.vkedco.blogspot.comwww.vkedco.blogspot.com
  18. 18. Three Language Operations { } { } { }.,1|...isthenlanguage,aisIf .,0|...isofclosureKleenethelanguage,aisIf .and|languages,theareandIf 121 021 * 221212121 LxnxxxLL LxnxxxLLL LxLxxxLLLL nin nin ∈≥= ∈≥= ∈∈= ≤≤ + ≤≤ www.vkedco.blogspot.comwww.vkedco.blogspot.com
  19. 19. Atomic & Compound Regular Expressions ● Regular expressions can be divided into atomic and compound ● Atomic regular expressions are basic building blocks out of which compound regular expres- sions are built ● There are typically three atomic regular expres- sions: unit strings (strings of one symbol), empty strings (strings of no symbol), and the empty set of strings www.vkedco.blogspot.comwww.vkedco.blogspot.com
  20. 20. Atomic Regular Expressions ( ) ( ) { } ( ) { } ( ) { }=∅ = =Σ∈ L L aaLa ΣrL r .3 .2 then,If1. :sexpressionregularatomicoftypesThree .alphabetsomeover languagethedenotesthatstringaisexpressionregularA εε www.vkedco.blogspot.comwww.vkedco.blogspot.com
  21. 21. Compound Regular Expressions ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( )( ) .Then.expressionregularais.3 .Then.expressionregularais.2 .Then.expressionregularais1. s.expressionregularbeandLet *** 212121 212121 21 rLrLr rLrLrrLrr rLrLrrLrr rr = = ∪=++ www.vkedco.blogspot.comwww.vkedco.blogspot.com
  22. 22. Compiling FSAs from RegExps www.vkedco.blogspot.comwww.vkedco.blogspot.com
  23. 23. Atomic Reg Exps NFAs→ Σ∈a a This NFA accepts only the string 'a' and nothing else www.vkedco.blogspot.comwww.vkedco.blogspot.com
  24. 24. Atomic Reg Exps NFAs→ ε ε This NFA accepts only the empty string www.vkedco.blogspot.comwww.vkedco.blogspot.com
  25. 25. Atomic Reg Exps NFAs→ ∅ This NFA accepts only the empty set, i.e., no strings www.vkedco.blogspot.comwww.vkedco.blogspot.com
  26. 26. Compound Reg Exps NFAs→ ( ) sexpressionregularare,where, 2121 rrrr + ε r1 r2 Another notation, commonly used in regexp engines, is (r1 | r2 ), in other words, either r1 or r2 ε ε ε www.vkedco.blogspot.comwww.vkedco.blogspot.com
  27. 27. Compound Reg Exps NFAs→ ε r1 r2 This compound NFA accepts if and only if either the NFA for r1 (upper one) accepts or the NFA for r2 (lower one) accepts ε ε ε www.vkedco.blogspot.comwww.vkedco.blogspot.com
  28. 28. Compound Reg Exps NFAs→ ( ) expressionregulariswhere, 11 rr + ε ε εr1 This regular expressions accepts strings that match r1 at least once www.vkedco.blogspot.comwww.vkedco.blogspot.com
  29. 29. ( )* 1r ε ε εr1 ε Compound Reg Exps NFAs→ This regular expressions accepts strings that match r1 zero or more times www.vkedco.blogspot.comwww.vkedco.blogspot.com
  30. 30. Defining Regular Expressions www.vkedco.blogspot.comwww.vkedco.blogspot.com Just in case you are interested/curious, below are a few links to my presentations on regular expressions in Python and Perl: ● http://vkedco.blogspot.com/2013/02/python-perl-regular-expression-match.html ● http://vkedco.blogspot.com/2013/02/python-amp-perl-py-pattern-compilation.html ● http://vkedco.blogspot.com/2013/02/python-perl-loose-tight-regular.html
  31. 31. Defining Regexps /a/  ­  match  all  occurrences  of  character  'a' /ab/  ­  match  all  occurrences  of  sequence  'ab' /Ab cD/ ­ match all occurrences of sequence  'Ab cD' ● Regular expressions are placed inside the pair of matching forward slashes - / / ● Regular expressions are case-sensitive ● Examples: www.vkedco.blogspot.comwww.vkedco.blogspot.com
  32. 32. Disjunction /[Aa]/ ­ match either 'A' or 'a' /]abc]/ ­ match all occurrences of sequence  'a' or 'b' or 'c' /[cC]onference of [bB]irds/ ­ match strings  like  'conference  of  birds'  or  'Conference  of birds' or 'conference of Birds' or 'Con­ ference of Birds' ● Disjunction (or'ing) of characters inside a regexp is done with the matching square brackets [ ] ● All characters inside [ ] are part of the disjunction ● Examples: www.vkedco.blogspot.comwww.vkedco.blogspot.com
  33. 33. Positive & Negative Ranges /[A-Z]/ - match an uppercase letter /]a-z]/ - match a lowercase letter /[0-9]/ - match a digit /[^A-Z]/ - match a non-uppercase letter /[^Ab]/ - match neither 'A' nor 'b' ● [ ] can be used in conjunction with – to specify character ranges ● Ranges can be negated with the special character ^ if it is the first character inside [ ] ● Examples: www.vkedco.blogspot.comwww.vkedco.blogspot.com
  34. 34. Zero or One Occurrences /Examples?/ - match 'Example' or 'Examples' /colou?r/ - match 'colour' or 'color' ● Special character ? is used to specify zero or one occurrences of the preceding character ● Examples: www.vkedco.blogspot.comwww.vkedco.blogspot.com
  35. 35. Kleene*, Kleene+, Wildcard . /a*/ ­ match '', 'a', 'aa', 'aaa', 'aaaa', etc /(ab)*/  ­  match  '',  'ab',  'abab',  'ababab',  'ababab', etc /a+/ ­ match 'a', 'aa', 'aaa', 'aaaa', etc /(ab)+/  ­  match  'ab',  'abab',  'ababab',  'abababab', etc /beg.n/ ­ match 'begin', 'began', 'begun' ● Special charater + (aka Kleene +) specifies one or more occurrences of the regular expression that comes right before it ● Special character * (aka Kleene *) specifies zero or more occur- rences of the regular expression that comes right before it ● Special character . (wildcard) specifies any single character www.vkedco.blogspot.comwww.vkedco.blogspot.com
  36. 36. Anchors /^Omar Khayyam/ matches 'Omar Khayyam' only  at the beginning of the text /Omar Khayyam$/ matches 'Omar Khayyam' only  at the end of the text /^Omar  Khayyam$/  matches  only  'Omar  Khayyam' ● Anchors are special characters that anchor a regexp to specific position in the text they are matched against ● The anchors are ^ and $ anchor regexps at the beginning and end of the text, respectively www.vkedco.blogspot.comwww.vkedco.blogspot.com
  37. 37. Anchors /btheb/ matches ' the ' but not 'weather' /BtheB/ does not match 'the' but does match 'weather' ● Anchors b and B match at word boundaries and non-word boundaries, respectively www.vkedco.blogspot.comwww.vkedco.blogspot.com
  38. 38. Regexp Groups www.vkedco.blogspot.comwww.vkedco.blogspot.com
  39. 39. Grouping Regular Expressions ● Regular expressions can be broken into compo- nents ● Such components are called groups ● A group match is a specific part of text that matches a specific regular subexpression in a larger expression ● Groups are specified with a pair of ( ) www.vkedco.blogspot.comwww.vkedco.blogspot.com
  40. 40. Referencing Group Matches ● Group matches are numbered $1, $2, $3, etc (some regexp notations use just numbers: 1, 2, 3, etc) ● These numbers are called backreferences be- cause they refer specific text segments back to specific regular subexpressions ● Backreferences are used in substitutions www.vkedco.blogspot.comwww.vkedco.blogspot.com
  41. 41. Sample Problem Design a regular expression that parses email addresses into user name, host name, and host extension www.vkedco.blogspot.comwww.vkedco.blogspot.com
  42. 42. Possible Solution '([w.­]+)@([w.­]+).(com|net|org)' www.vkedco.blogspot.comwww.vkedco.blogspot.com
  43. 43. Example 01 $txt_01 = '1+1=2'; ## Suppose we match /(d)+(d)=(d)/ against $txt_01 ## Then the variable alignment is as follows: ## /(d)+(d)=(d)/ ## $1 $2 $3 ## In other words, $1 is bound to '1', ## $2 is bound to '2' and $3 is bound to '3' www.vkedco.blogspot.comwww.vkedco.blogspot.com
  44. 44. Example 02 $txt_01 = '1+1=2'; ## Suppose we match /((d)+(d)=(d))/ against $txt_01. ## Then the special variable alignment is ## /((d)+(d)=(d))/; ## $1$2 $3 $4 ## In other words, $1 is bound to '1+1=2', ## $2 is bound to '1' and $3 is bound to '1', ## $4 is bound to '2'. www.vkedco.blogspot.comwww.vkedco.blogspot.com
  45. 45. Substitutions ● Groups (subexpressions of larger regular expres- sions) are specified with special characters ( ) ● Corresponding text segments that match subex- pressions are retrieved with special variables $1, $2, $3, etc (some formalisms use just integers 1, 2, 3, etc) ● These variables are aligned with left parentheses: $1 is aligned with the 1st left parenthesis, $2 is aligned with the 2nd left parenthesis, etc www.vkedco.blogspot.comwww.vkedco.blogspot.com
  46. 46. Patterns in Java www.vkedco.blogspot.comwww.vkedco.blogspot.com
  47. 47. Patterns in Java ● Java has the package java.util.regex that allows the programmer to work with patterns, including regular expressions ● java.util.regex package has two major classes: Pattern and Matcher ● Pattern compiles patterns into finite state automata ● Matcher uses the compiled pattern to find substrings that match the pattern www.vkedco.blogspot.comwww.vkedco.blogspot.com
  48. 48. Java Pattern Example ● This is how you compile a regular expression: – Pattern pat = Pattern.compile(“abc”); ● This is how you create a matcher object, essentially an NFA that can be used to find matches: – Matcher match = pat.matcher(str); ● This is how you can test for a match: – match.matches() is a boolean predicate www.vkedco.blogspot.comwww.vkedco.blogspot.com
  49. 49. Java Pattern Example while (true) { Pattern pattern = Pattern.compile(console.readLine("%nEnter your regex: ")); Matcher matcher = pattern.matcher(console.readLine("Enter input string to search: ")); boolean found = false; // iterate through groups and print matches and their positions while (matcher.find()) { console.format("I found match "%s" starting at " + "index %d and ending at index %d.%n", matcher.group(), matcher.start(), matcher.end()); found = true; } if(!found) { console.format("No match found.%n"); } } www.vkedco.blogspot.comwww.vkedco.blogspot.com source code is here
  50. 50. Java Pattern Example ● This is how you can call RegexDemo from the command line: – java RegexTestHarness ^(0|1(01*0)*1)*$ ● Then you can give it strings to match ● You can also redirect a file to RegexDemo: – java RegexTestHarness reg-exp < numbers.txt www.vkedco.blogspot.comwww.vkedco.blogspot.com
  51. 51. Sample Outputs Enter your regex: a Enter input string to search: bac I found the text "a" starting at index 1 and ending at index 2. Enter your regex: /d/ Enter input string to search: bc12ab No match found. Enter your regex: d Enter input string to search: bc12ab I found the text "1" starting at index 2 and ending at index 3. I found the text "2" starting at index 3 and ending at index 4. www.vkedco.blogspot.comwww.vkedco.blogspot.com
  52. 52. Pattern Matching Real Intelligence or Illusion of Intelligence? www.vkedco.blogspot.comwww.vkedco.blogspot.com
  53. 53. Illusion of Intelligence? ● In the 1960's and 70's, AI researchers developed a num- ber of programs that impressed many people with their 'intelligence' ● ELIZA could have a dialog with a person ● STUDENT solved high school algebra word problems ● MACSYMA solved problems in integral and differential calculus ● SAM understood simple stories www.vkedco.blogspot.comwww.vkedco.blogspot.com
  54. 54. Pattern Matching ● What all these systems had in common was a technique that in symbolic AI came to be known as pattern match- ing ● The basic idea was to define a set of rules to look for patterns in inputs and produced outputs by substituting what they matched for something else ● Results were impressive to the uninitiated until the re- searchers themselves started explaining how they did what they did www.vkedco.blogspot.comwww.vkedco.blogspot.com
  55. 55. ELIZA ● ELIZA was written by Joseph Weizenbaum in 1966 ● The system was named after the heroine of the play Pygmalion whom a good teacher taught how to speak English ● ELIZA was written not to just speak but to emulate a Rogerian psychologist ● The basic principle of Rogerian psychology is to never ask direct questions but let the patients tell their own story www.vkedco.blogspot.comwww.vkedco.blogspot.com
  56. 56. Sample Dialog with ELIZA USER> Men are all alike ELIZA> IN WHAT WAY USER> They are always bugging us about something or another. ELIZA> CAN YOU BE MORE SPECIFIC USER> Well, my boyfriend made me come here. ELIZA> YOU BOYFRIEND MADE YOU COME HERE USER> He says I'm depressed much of the time. ELIZA> I AM SORRY TO HEAR YOU ARE DEPRESSED. USER> It's true. I am unhappy. ELIZA> DO YOU THINK COMING HERE WILL HELP YOU NOT BE UNHAPPY USER> I need some help, that much is certain. www.vkedco.blogspot.comwww.vkedco.blogspot.com
  57. 57. How ELIZA Works ● The program looks for specific patterns in the input and then prints the response on the basis of what it finds ● For example, when the program finds 'alike' or 'same', it may print 'IN WHAT WAY' ● When the programs matches the pattern 'I need X', it may print 'WHAT WOULD IT MEAN IF YOU GOT X' ● For example, if the user types 'I need some help', ELIZA prints 'WHAT WOULD IT MEAN IF YOU GOT SOME HELP' ● The level of output sophistication depends on how elaborate the patterns are ● Try an online version of ELIZA here www.vkedco.blogspot.comwww.vkedco.blogspot.com
  58. 58. ELIZA RULES ● A rule is a data structure that consists of a pattern and a set of re- sponses ● Example: RULE PATTERN: 'X I want Y' RESPONSES: { 'What would it mean if you got Y', 'Why do you want Y', 'Suppose you got Y soon' } www.vkedco.blogspot.comwww.vkedco.blogspot.com
  59. 59. Rule Application ● Suppose the user types: Some day I want to read Farid Uddin Attar's Conference of Birds in the original ● The rule's patter matches X to 'Some day' and Y to 'to read Farid Uddin Attar's Conference of Birds in the original' ● In symbolic pattern matching, the result of the match is referred to as list of bindings: X is bound to 'Some day' and Y is bound 'to read Farid Uddin Attar's Conference of Birds in the original' www.vkedco.blogspot.comwww.vkedco.blogspot.com
  60. 60. Rule Application ● After the list of bindings is found, the program can then use it to produce the following responses: – What would it mean if you got to read Farid Uddin Attar's Conference of birds in the original – Why do you want to read Farid Uddin Attar's Conference of Birds in the original – Suppose you got to read Farrid Uddin Attar's Conference of Birds in the original ● There are still two important problems to think about: What if mul- tiple rules' patterns match and how is a specific response is chosen within a matched rule www.vkedco.blogspot.comwww.vkedco.blogspot.com
  61. 61. ELIZA Algorithm while ( True ) { input = get_input_from_user(); applicable_rules = find_applicable_rules(input, rule_database); chosen_rule = choose_applicable_rule(applicable_rules); chosen_response = choose_response(chosen_rule); chosen_response = substitute_matches(chosen_response); print_response(chosen_response); } www.vkedco.blogspot.comwww.vkedco.blogspot.com
  62. 62. References ● Ch 02, D. Jurafsky & J. Martin. Speech & Language Processing, Prentice Hall, ISBN 0-13-095069-6 ● Weizenbaum, J. 1966. "Eliza - A Computer Program for the Study of Natural Language Communication Between Man and Machine." Communications of the ACM, 9(1): 36-45. (pdf) www.vkedco.blogspot.comwww.vkedco.blogspot.com

×