Cognitive plausibility in learning algorithms

Cognitive plausibility in learning algorithms
With application to natural language processing
Arvi Tavast, PhD
Qlaara Labs, UT, TLU
Tallinn, 10 May 2016

Introduction Understanding humans Results Application
Motivation
Why cognitive plausibility?
Objective: best product vs best research
Model the brain
End-to-end learning from raw unlabelled data
Grounded cognition
Cognitive computing, neuromorphic computing
Feedback loop: using the model to better understand the
object to be modelled

Outline
Heretical view on language - established learning model - application to NLP
1 Introduction
2 Understanding humans
Understanding human communication
Understanding human learning
Rescorla-Wagner learning model
3 Results
4 Application
Naive Discrimination Learning

My background
mainly in linguistics
1993 TUT computer systems
1989-2004 IT translation
2000-2006 Microsoft MILS
2002 UT MA linguistics
2008 UT PhD linguistics
2015 Uni T ¨Ubingen postdoc quantitative linguistics

How do we explain the observation that verbal communication sometimes works
The channel metaphor
Speaking is like sending things by train, selecting suitable
wagons (words) for each thing (thought)
Hearing is like decoding the message
⇒ meanings are properties of words
Communication as uncertainty reduction
Speaking is like sending blueprints for building things, which
the receiver will have to follow (subject to their abilities,
available materials, etc.)
⇒ meanings are properties of people
Hearing is like using hints to reduce our uncertainty about
the message

When can the channel metaphor work?
Encoding of a message must contain a set of discriminable
states that is greater than or equal to the number of
discriminable states in the to-be-encoded message
or:
Encoding thoughts with words can only work if the number
of possible thoughts is smaller than or equal to the number
of possible words
This is the case only in restricted domains (weather forecasts)
Compare: reconstructing a document based on its hash sum

Understanding human learning
Compositional vs discriminative
Possible ways of conceptualising biological learning
Compositional model: we start as an empty page, adding
knowledge like articles in an encyclopedia
Discriminative model: we start by perceiving a single object
(the world) and gradually learn to discriminate between its
parts
If discriminative:
Human language models can not be constant across time or
subjects

The Rescorla-Wagner learning model
Language acquisition can be described as creating a statistical relationship
The Rescorla-Wagner model: how do we learn that Cj means O
if we see that Cj ⇒ O, the relationship is strengthened
less, if there are other cues
if we see that Cj ⇒ ¬O, the relationship is weakened
more, if there are other cues
(if we see that ¬Cj ⇒ O, the relationship is weakened)

Feature-label-order effect
Creating the relationship between word and concept is only possible in one direction
Feature-label-order effect
If concept ⇒ word, the relationship is strengthened
If word ⇒ concept, the relationship is not strengthened
Number of objects in the world number of words in
language
Abstraction inevitably and irreversibly discards information
Recovering a meaning from a word is necessarily
underspeciﬁed
Ramscar, M., Yarlett, D., Dye, M., Denny, K., and Thorpe, K. (2010). The effects of feature-label-order and their
implications for symbolic learning. Cognitive Science, 34(6), 909–957.

Aging and cognitive decline
Why do our verbal abilities seem to fail around the age of 65?
Ramscar, M., Hendrix, P., Shaoul, C., Milin, P., and Baayen, H. (2014). The myth of cognitive decline: Non-linear dynamics
of lifelong learning. Topics in Cognitive Science, 6(1), 5–42.

Morphology
Implicit morphology (without morphemes)
0.1
0.378
0.116
0.576
0.531
0.4190.39
0.377
0.516
0.475
0.47
0.587
0.124
0.225
0.216
0.1630.138
0.5
0.5
#mA
ki#
#tA
tA# #mt
mtA
tAk
Aki
itA
#mi
mit
At#
mAt
#m@
@tA
m@t
#m::t
m::tA
###

The R package: installation and basic usage
ndl: https://cran.r-project.org/web/packages/ndl/index.html
ndl2 (+ incremental learning): contact the authors
wm = estimateWeights(events) # Danks equilibria
wm = learnWeights(events) # incremental, ndl2 only

Input data for Danks estimation: frequencies
Outcomes Cues Frequency
aadress aadress S SG N 1
aadresse aadress S PL P 1
aadressil aadress S SG AD 4
aadressile aadress S SG ALL 1
aasisid aasima V SID 1
aasta aasta S SG G 2
aasta aasta S SG N 1
aastane aastane A SG N 48

Input data for incremental learning: single events
Outcomes Cues Frequency
aadress aadress S SG N 1
aadresse aadress S PL P 1
aadressile aadress S SG ALL 1
aasisid aasima V SID 1
aasta aasta S SG N 1
...

Output: weight matrix, cues x outcomes
Cues Outcomes Application
letter ngrams words reading
character features words reading
words lexomes POS tagging
lexomes letter ngrams morphological synthesis
contexts words distributional semantics
audio signal words speech recognition
words audio signal speech synthesis

About the weight matrix
What we can look at:
Similarity of outcome vectors
Similarity of cue vectors
MAD (median absolute deviation) of outcome vector
Competing cues

About the weight matrix
Other properties:
No dimensionality reduction (played with 200k x 100k)
Danks equations subject to R’s 232 limit (matrix
pseudoinverse)
Slow (weeks on ca 16 cores, 200G ram)
Performance less than word2vec etc, but comparable

Some NLP tools
How to get started quickly with NLP
Python NLTK
EstNLTK
Gensim (incl word2vec)
DISSECT
Java GATE (also web)
Stanford NLP
Deeplearning4j (incl word2vec)
C word2vec
R NDL

Language understanding
What’s missing from full language understanding
Training material
Interannotator agreement is less than perfect
Corpus is heterogenous
This is not a methodological ﬂaw
Communicative intent and self-awareness
If cues are lexomes (=what the speaker wanted to say), the
system must want something.

Thanks for listening
Contacts and recommended reading
Contact
arvi@qlaara.com
Easy reading
blog.qlaara.com
Recommended reading
Harald Baayen
www.sfs.uni-tuebingen.de/hbaayen/
Michael Ramscar
https://michaelramscar.wordpress.com/

Cognitive plausibility in learning algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Cognitive plausibility in learning algorithms

Similar to Cognitive plausibility in learning algorithms (20)

Recently uploaded

Recently uploaded (20)

Cognitive plausibility in learning algorithms