Tata AIG General Insurance Company - Insurer Innovation Award 2024
Media IT - Natural Language Processing
1. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
Faculty of Science, Technology and Communication (FSTC)
Bachelor en informatique (professionnel)
-- Media IT -–
¯_(ツ)_/¯
Unit 3
Natural language
processing
2. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
3. Natural language processing
2
You may be surprised to learn that it is possible to see her spinning both
clockwise and counterclockwise. It is related to bistable perception in
which an ambiguous 2-dimensional figure can be seen in from two different
perspectives. Because there is no third dimension, our brains try to
construct space around the figure.
Try looking at the figure and then blink; she may appear to change
directions immediately after you blink. Another strategy is to focus on a
specific part of the figure.
3. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
3. Natural language processing
3
3.1 Semantics and Artificial Intelligence
3.2 Computational linguistics
3.3 Syntax, Semantics and Pragmatics
3.4 Exercise
4. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 4
3. Natural language processing
3.1 Semantics and Artificial Intelligence
Art
For a human:
Art
Porn
Art
For a machine:
??!!
Weak AI
• Logical reasoning
• Knowledge
representation
• Natural Language
Processing
• Machine Learning
• …
Strong AI
5. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 5
3. Natural language processing
3.1 Semantics and Artificial Intelligence
Weak AI
• Logical reasoning
• Knowledge
representation
• Natural Language
Processing
• Machine Learning
• …
Voting
photography
best picture 2010
Tagging
Statistical solution
Requires critical mass of
“good” users
Deterministic solution
Requires machine-readable
metadata
Photo hasColor.BW
photoOf.(Woman
isNude)
isOutdoors.Daylight
Description Logics
6. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 6
3. Natural language processing
3.1 Semantics and Artificial Intelligence
Deterministic solution
Requires machine-readable
metadata
Photo hasColor.BW
photoOf.(Woman
isNude)
isOutdoors.Daylight
Description Logics
?
NLP is a subfield of linguistics and artificial intelligence. It studies the problems
inherent to the processing and manipulation of natural language. The ultimate
goal of NLP is to make computers “understand” statements written in human
languages. The definition of “understanding” is one of the major problems in
NLP.
Natural Language Processing (NLP)
Examples of applications:
• Optical character recognition (OCR)
• Machine translation
• Speech recognition
https://translate.google.com/
7. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 7
3. Natural language processing
3.1 Semantics and Artificial Intelligence
https://www.deepl.com/
8. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 8
3. Natural language processing
3.1 Semantics and Artificial Intelligence
wrong interpretation
false positive
too much of information
9. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 9
3.2 Computational linguistics
Tagging and Parsing natural language
noun phrase (NP),
prepositional phrase (PP),
adjective phrase (ADJP),
verb phrase (VP)
Phrase types
adjective (JJ),
conjunction (CC),
preposition (IN),
determiner (DT),
noun (NN),
verb (VBZ)
Phrase tags:
3. Natural language processing
10. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 10
Tagging and Parsing natural language
3. Natural language processing
The TreeTagger is a tool for annotating
text with part-of-speech and lemma
information. It was developed by Helmut
Schmid in the TC project at the Institute for
Computational Linguistics of the University
of Stuttgart.
The TreeTagger has been successfully used
to tag German, English, French, Italian,
Danish, Dutch, Spanish, Bulgarian, Russian,
Portuguese, Galician, Greek, Chinese,
Swahili, Slovak, Slovenian, Latin, Estonian,
Polish, Romanian, Czech, Coptic and old
French texts and is adaptable to other
languages if a lexicon and a manually
tagged training corpus are available.
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
3.2 Computational linguistics
11. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 11
a date
3. Natural language processing
Ambiguities in parsing a natural language
? Date ?
3.2 Computational linguistics
12. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 12
3. Natural language processing
WordNet – lexical database
http://wordnet.princeton.edu/
117.000 semantically equivalent words (synsets) are interlinked by means of conceptual-semantic and lexical relations
hypernym: a word with a more general meaning
E.g., animal is a hypernym of cat
hyponym: a word with a more specific meaning
E.g., cat is a hyponym of animal
synonym: a word with identical meaning
E.g., car and automobile are synonyms
homonym: words with identical spelling but different meaning
E.g., Ada is a programming language but also a person
3.2 Computational linguistics
13. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
Lecture "WWW basics" by Prof. Christoph Meinel at the Hasso-Plattner-Institute (HPI) in Potsdam
#n
How TCP/IP works
#1
Intro
#2
Protocols in general
#3
Error-handling as
task of a protocol
#4
Error-
handling
13
3. Natural language processing
Semantic annotation of lecture videos
Natural language explanation
Protocol
hasTask.ErrorHandling
Description logics
<owl:Class rdf:about="#LO3">
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Protocol" />
<owl:restriction>
<owl:onProperty rdf:resource="#hasTask" />
<owl:someValuesFrom rdf:resource="#ErrorHandling" />
</owl:restriction>
</owl:intersectionOf>
</owl:Class>
Machine readable XML
3.2 Computational linguistics
14. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
14
How old are my 3
children?
1. The multiplication of their ages is 36
2. The addition of their ages is the number on the door
3. The oldest is blonde 1 x 1 x 36 = 36
1 x 2 x 18 = 36
1 x 3 x 12 = 36
1 x 4 x 9 = 36
1 x 6 x 6 = 36
2 x 2 x 9 = 36
2 x 3 x 6 = 36
3 x 3 x 4 = 36
1 + 1 + 36 = 38
1 + 2 + 18 = 21
1 + 3 + 12 = 16
1 + 4 + 9 = 14
1 + 6 + 6 = 13
2 + 2 + 9 = 13
2 + 3 + 6 = 11
3 + 3 + 4 = 10
use the same possible ages
you will ask a 3d question
because you have 2 x the
same number
there is just one who
the oldest
15. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 15
3. Natural language processing
3.3 Syntax, Semantics and Pragmatics
16. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 16
3. Natural language processing
3.3 Syntax, Semantics and Pragmatics
“...a goal of the Web was that, if the interaction
between person and hypertext could be so intuitive
that the machine-readable information space gave an
accurate representation of the state of people's
thoughts, interactions, and work patterns, then
machine analysis could become a very powerful
management tool, seeing patterns in our work and
facilitating our working together through the typical
problems which beset the management of large
organizations.”
A vision of a smarter web by Tim Berners-Lee
Scientific American, May, 2001
Classical Web is build upon HTML (HyperText Markup Language)
Can HTML be used to build the Semantic Web?
17. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 17
3. Natural language processing
3.3 Syntax, Semantics and Pragmatics
Semantics refers to aspects of meaning, as
expressed in language or other systems of
signs
Syntax is the study of the structure of sign
systems, focusing on the form, not the
meaning
Pragmatics is the study of the practical use
of signs by agents or communities of
interpretation within particular circumstances
and contexts
I movies the go to with my wife
The movies went to me
Is the window open?
[The asking person may feel cold]
I go to the movies with my wife
I went to the movies
"Classical Web"
Semantic Web applications
18. Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
3. Natural language processing
3.4 Exercise
Practical exercise
Install Tree Tagger and test it in terminal / command line
1.
How it works
• You work in teams of 2 students.
• Submit the running program and the source code by November 30, 2018 via Moodle!
• This work is considered 10% of your final grade.
Develop a program, e.g., in Java, that asks the user a question and returns an exciting statement
2.
USER: I have a beer after the class
SYSTEM: Cool, you have a beer after the class
I
have
a
beer
after
the
class
./bin/tree-tagger -token ./lib/english-utf8.par input.txt output.txt
I PP
have VHP
a DT
beer NN
after IN
the DT
class NN
USER: My mom is great
SYSTEM: Cool, your mom is great