This document discusses two challenges in artificial intelligence: errors made by AI systems and the concept of "grandmother cells" in neuroscience. Regarding AI errors, it proposes using stochastic separation theorems from high-dimensional geometry to build fast, one-shot correctors for AI systems. Regarding grandmother cells, it reviews experiments showing neurons selectively responding to concepts and discusses how ensembles of neurons could model concept cells and neural selectivity. The document outlines applications in computer vision, robotics, and multi-agent learning and concludes that geometric theorems allow creation of efficient correctors and understanding of neural encoding schemes.
Customer Service Analytics - Make Sense of All Your Data.pptx
Errors of Artificial Intelligence, their Correction and Simplicity Revolution in Neuroscience
1. Errors of Artificial Intelligence, their
Correction and Simplicity Revolution in
Neuroscience
A.N. Gorban
University of Leicester, UK &
Lobachevski University, Russia
Joint work
with I.Y. Tyukin and B. Grechuk (Leicester), and
V.A. Makarov (Madrid),
2. The heresy of unheard-of simplicity
Boris Pasternak, “The Waves”, 1931
Assured of kinship with Everything
And with the Future closely knit
We can’t but fall – a heresy! –
To unbelievable simplicity.
But to be spared we can’t expect
If we do not conceal it closely.
Men need it more than anything,
But complex things are easier for them.
2
В родстве со всем, что есть, уверясь,
И знаясь с будущим в быту,
Нельзя не впасть к концу, как в ересь,
В неслыханную простоту.
Но мы пощажены не будем,
Когда ее не утаим,
Она всего нужнее людям,
Но сложное понятней им.
3. Plan
• Two problems:
- Errors of AI systems and one-shot correctors;
- Grandmother cells, sparse coding and single-cell revolution
in neuroscience.
• Between curse of dimensionality and blessing of
dimensionality: Measure concentration and Stochastic
separation:
- Geometric preliminaries,
- Surprising power of Fisher’s discriminant in high
dimension and main stochastic separation theorems.
• Two examples of technical applications.
• Selective memory of neural ensembles.
• Conclusion and outlook.
3
4. Development of AI is a strategic
direction of technical evolution,
The world of big legacy AI systems grows:
• Google (Google Cars, FaceNet, DeepMind)
• Facebook (DeepText, face recognition API)
• Amazon (Recognition, Alexa)
• Apple (Siri, face recognition phone unlock)
• IBM (Watson, face recognition API)
• NEC (face recognition)
……………………………………………….
but AI SYSTEMS MAKE ERRORS!
4
10. • InsiperoBot Poor Testing/Coding
• Cortana Not Working Bias/Limited Training Set
Poor Testing/Coding
• Alexa Blasting Music Unexpected Human Behaviour
• Passport Checker Bias/Limited Training Set Poor Testing/Coding
• Knightscope Robot Hits Toddler Poor Testing/Coding Unexpected
Human Behaviour
• Facebook Translate Arrest, Bias/Limited Training Set Unexpected
Human Behaviour
• Google Tag Racist Bias/Limited Training Set
• WeChat Racist Bias/Limited Training Set Unexpected
Human Behaviour
• Microsoft - Tay Unexpected Human Behaviour Bias/Limited
Training Set
(Thanks to Rosie Fenwick and Eliyas Woldegeorgis for collecting the list)
AI Errors: and the list continues …
10
11. Fundamental Challenge #1:
Correct Mistakes of AIs
• With guaranteed high probabilities;
• Without damaging correct skills;
• Without complete re-training;
• Fast (one-shot);
• Reversible.
11
12. Corrector has to separate mistakes from
correctly solved examples and correct them
Corrector is a binary classifier for error diagnosis combined
with the modified decision rule for high risk situations.
Inputs Outputs
Internal signals
Correction
Corrector
12
13. Corrector has to separate mistakes from
correctly solved examples and correct them
Corrector is a binary classifier for error diagnosis combined
with the modified decision rule for high risk situations.
Inputs Outputs
Internal signals
Correction
Corrector
13
14. Challenge #2:
Grandmother and concept cells
14
Grandmother cells, was introduced by Lettvin around
1969 in a jocular story about ‘a great if unknown
neurosurgeon’, Akakhi Akakhievitch, who deleted concepts
from patient’s memory by ablating the corresponding cells.
More seriously, this is a hypothesis (proposed by
Konorski in 1967) that there are neurons that react
selectively to specific concepts and images.
There are hypothetical grandmother cells, Jennifer Aniston
cells, Dodecahedron cells of even ‘Grandmother cell’ cells
(the cells, that react selectively on the pattern or even the
idea of grandmother cell).
15. 15
Idea of grandmother cell, a neuron that reacts selectively on
a pattern: Jennifer Aniston cell, Dodecahedron cell, and
‘Grandmother cell’ cell.
16. 16
Schematic representation of the idea behind idealised ‘concept
cells’: high-dimensional input signal (dimension N) after
preprocessing (dimension n), arrives to an ensemble of non-
interacting concept cells.
A source of sparsity could be bounded number of links.
17. ‘Five dogmas’ of a single cell
revolution (Barlow, 1972):
1. (Focus on cellular level) To understand nervous
function one needs to look at interactions at a cellular
level, rather than either a more macroscopic or
microscopic level, because behaviour depends upon the
organized pattern of these intercellular interactions.
2. (Minimization of the number of active neurons) The
sensory system is organized to achieve as complete a
representation of the sensory stimulus as possible with
the minimum number of active neurons.
17
18. 3. (Synergy between experience and development)
Trigger features of sensory neurons are matched to
redundant patterns of stimulation by experience as
well as by developmental processes.
4. (Individual concepts correspond to small samples
from huge number of concept cells) Perception
corresponds to the activity of a small selection from
the very numerous high-level neurons, each of which
corresponds to a pattern of external events of the
order of complexity of the events symbolized by a
word.
5. High impulse frequency in such neurons
corresponds to high certainty that the trigger feature
is present.
18
19. 19
Concept cells in experiments
• A series of experiments demonstrated that neurons in the
human medial temporal lobe (MTL) fire selectively to images
of faces, animals, and other objects or scenes.
• It was demonstrated that the firing of MTL cells was sparse -
most of them did not respond to the great majority of images
used in the experiment.
• These cells have low baseline activity and their response is
highly selective.
• ‘Jennifer Aniston’ cells responded to pictures of Jennifer
Aniston but rarely of very weekly to pictures of other persons.
• These neurons respond also to the printed name of the
person.
• The voluntary control of these neurons is possible via
imagination: it is sufficient to imagine the concept or
‘continuously think of the concept’.
20. 20
Interaction of concepts
• There are several concept cells for each concept.
• One cell can fire for different concepts – this is
association between concepts.
• For example, the ‘Jennifer Aniston’ cells also fired to
Lisa Kudrow, a costar in the TV series Friends, and
‘Luke Skywalker’ cell also fired to Yoda.
• The presence of the individual can, in most cases,
be reliably decoded from a small number of
neurons.
• Redistribution of attention between different parts of
the image can have effect opposite to association.
• For example, the ‘Jennifer Aniston’ cell did not react
on the picture, where Jennifer Aniston is together
with Brad Pitt. It is recognised rather as Brad Pitt.
23. Concentration of the volume and the Gibbs
theorem about equivalence of ensembles
Radius r
Volume
𝑉𝑛 𝑟
𝑉𝑛 1
= 𝑟 𝑛
Volume of high-dimensional ball
is concentrated near its border (sphere)
Almost all
volume
appears at r≈1
23
24. Concentration of the volume and the Gibbs
theorem about equivalence of ensembles
Radius r
Volume
𝑉𝑛 𝑟
𝑉𝑛 1
= 𝑟 𝑛
Volume of high-dimensional ball
is concentrated near its border (sphere)
Almost all
volume
appears at r≈1
24
25. Stochastic separation theorems in
high dimensions
Extreme points
In high dimension, with high
probability even in an
exponentially large random set
all points are extreme ones!
ε
Expected ε IS NOT small
in high dimension!
Bárány & Füredi 1988 – proof for uniform distribution in
balls.
Donoho &Tanner 2009 – proof for Gaussians
25
26. -Just linear discriminants are still too
complex and require iterative learning (like
SVM or perceptron).
-Fisher discriminants are explicit and non-
iterative (much better)!
After data whitening, Fisher’s discriminant can be
defined by a simple linear inequality:
Definition. A point x is Fisher separable from a finite
set Y with a threshold α (0 ≤ α < 1) if
(x, y)≤α(x, x) (1)
for all y from Y.
26
27. -Just linear discriminants are still too
complex and require iterative learning (like
SVM or perceptron).
-Fisher discriminants are explicit and non-
iterative (much better)!
After data whitening, Fisher’s discriminant can be
defined by a simple linear inequality:
Definition. A point x is Fisher separable from a finite
set Y with a threshold α (0 ≤ α < 1) if
(x, y)≤α(x, x) (1)
for all y from Y.
27
28. Whitening is a change of coordinates that
transforms the empiric covariance matrix into 1.
Whitening can be represented in four steps:
1. Centralise the data cloud (subtract the mean from all
data vectors), normalise the coordinates to unit
variance and calculate the empiric correlation matrix.
2. Apply principal component analysis (i.e. calculate
eigenvalues and eigenvectors of empiric correlation
matrix).
3. Delete minor components, which correspond to the
small eigenvalues of empiric correlation matrix.
4. In the remained principal component basis normalise
coordinates to unit variance.
28
29. Fisher separability inequality (1) holds for vectors
x, y if and only if x does not belong to a ball
•• cy/α
x
Lx
Excluded
volume
•
y
•
•
•
•
29
30. Assume the absence of large deviations and
sets with small volume but high probability.
Note: r≤1, α<1
but rα>0.5
30
31. For proof: just evaluate excluded
volumes and probability to be there
•• cy/α
x
Lx
Excluded
volume
•
y
•
•
•
•
Unit ball
31
33. Uniform distribution in a ball
Example: For n=100 and M < 2,740,000 the set of M
points is Fisher-separable (each point from the rest)
with probability p>99%
33
34. Product distribution in a cube 0≤xi≤1:
Coordinates of the vectors x are independent
random variables with variances σi.
34
37. Generalizations
• Sums of distributions (when the number of
summands grows slower than dimension);
• Products of distributions;
• Not i.i.d. samples;
• Combining with cluster analysis and
unsupervised learning (stochastic separation of
clusters);
• Kernel classifiers;
• More precise evaluation of constants;
………………………
37
40. Heuristic
The stochastic separation theorems hold if:
• There are no heavy tails of the probability
distribution;
• The sets of small volume should not have large
probability (what “small” and “large” mean is
strictly defined for different contexts).
In this cases, Fisher discriminant is an effective tool
for classification and AI correctors in high dimension.
40
41. IT applications
• Creation of classifiers;
• Computer vision and robotics;
• Correctors of AI systems;
• New approach to empirical dimension of datasets;
• Knowledge transfer between artificial intelligence
systems;
• Multiagent learning and social networks of AI;
…………………………………………….
41
42. TITANAI in action
TITANAi
Advanced Imaging Technology
TM
Linear functionals in high dimensions for
matching faces in Live Video Streams
42
43. TITANAI in action
TITANAi
Advanced Imaging Technology
TM
Linear functionals in high dimensions for
matching faces in Live Video Streams
43
46. Recent £1M ARHC grant “Automated recording
and machine learning for collating Roman
ceramic tablewares…”, PI Prof P. Allison.
46
47. Schematic representation of three
memory encoding schemes:
Selectivity. A neuron (shown in green) receives inputs from multiple
presynaptic cells that code different information items. It detects
(responds to) only one stimulus (purple trace), whereas rejecting the
others;
Clustering. A neuron (shown in blue) detects a group of stimuli
(purple and blue traces) and ignores the others;
Acquiring memories. A neuron (shown in red) learns dynamically a
new memory item (blue trace) by associating it with a know one47
51. Take home messages
• All AI systems make mistakes and need corrections.
• Ensembles of single neurons can correct mistakes of AI
and model neuronal selectivity.
• They are also used in revealing neuronal selectivity, fast
learning of ‘concept cells’ by association, and memory
mechanisms.
• The stochastic separation theorems give the theoretical
background of existence and efficiency of ‘concept cells’
and sparse coding.
• The stochastic separation theorems allow us to create
reversible, non-destructive and non-iterative (one-shot)
correctors in a form of Fisher discriminants or small
neural networks.
51
52. • J. W. Gibbs, Elementary Principles in Statistical Mechanics,
developed with especial reference to the rational
foundation of thermodynamics, (New York: Dover
Publications, 1960 [1902])
• Lévy P, Problèmes concrets d'analyse fonctionnelle. Paris:
Gauthier-Villars; 1951. (Second edition.)
• F. Rosenblatt, Principles of Neurodynamics: Perceptrons and
the Theory of Brain Mechanisms, Spartan Books, 1962
• Talagrand, M. (1995). Concentration of measure and
isoperimetric inequalities in product spaces. Publications
Mathématiques de l'Institut des Hautes Etudes Scientifiques,
81(1), 73-205.
• M. Gromov, Isoperimetry of waists and concentration of
maps, GAFA, Geom. Funct. Anal. 13 (2003) 178-215 52
References
52
53. • P. C. Kainen, Utilizing geometric anomalies of high dimension:
When complexity makes computation easier. In Computer
Intensive Methods in Control and Signal Processing (pp. 283-294).
Birkhäuser, Boston, MA. (1997)
• P.C. Kainen, V. Kůrková, Quasiorthogonal dimension of Euclidian
spaces, Appl. Math. Lett. 6(3) (1993) 7–10.
• H.B. Barlow, Single units and sensation: a neuron doctrine for
perceptual psychology?. Perception 1972;1(4):371–394.
• R. Quian Quiroga, L. Reddy, G. Kreiman, C. Koch, I. Fried. Invariant
visual representation by single neurons in the human brain.
Nature 2005;435(7045):1102–1107.
• Bárány I., Füredi Z. On the shape of the convex hull of random
points. Probab. Th. Rel. Fields 77, 231–240 (1988).
• Donoho D, Tanner J. Observed universality of phase transitions in
high-dimensional geometry, with implications for modern data
analysis and signal processing. Phil. Trans. R. Soc. A
2009;367:4273–4293. 53
54. • A.N. Gorban, V.A. Makarov, I.Y. Tyukin. The unreasonable
effectiveness of small neural ensembles in high-dimensional
brain. Phys. Life Rev., 29, 2019, 55-88.
• A. Tozzi, J. F. Peters, The Borsuk-Ulam theorem solves the curse of dimensionality,
Phys. Life Rev., 29, 2019, 89-92.
• V. Kreinovich, The heresy of unheard-of simplicity, Phys. Life Rev., 29, 2019, 93-95.
• G. Kreiman, It's a small dimensional world after all, Phys. Life Rev., 29, 2019, 96-97.
• V. Kůrková, Some insights from high-dimensional spheres, Phys. Life Rev., 29, 2019,
98-100.
• L. Fortuna, Nonlinear effects for the reinforcement of small neural ensembles in high
dimensional brain, Phys. Life Rev., 29, 2019, 101-103.
• C. van Leeuwen, The reasonable ineffectiveness of biological brains in applying the
principles of high-dimensional cybernetics, Phys. Life Rev., 29, 2019, 104-105.
• P. Varona, High and low dimensionality in neuroscience and artificial intelligence,
Phys. Life Rev., 29, 2019, 106-107.
• R. Barrio, “Brainland” vs. “flatland”: How many dimensions do we need in brain
dynamics? Phys. Life Rev., 29, 2019, 108-110.
• R. Quian Quiroga, Akakhievitch revisited, Phys. Life Rev., 29, 2019, 111-114
• A.N. Gorban, V.A. Makarov, I.Y. Tyukin, Symphony of high-
dimensional brain, Phys. Life Rev., 29, 2019, 115-119.
54
55. • A.N. Gorban, I. Tyukin, D. Prokhorov, K. Sofeikov. Approximation with random
bases: Pro et contra. Information Sciences, 324-325, 129-145, 2016.
• A.N. Gorban, I.Y. Tyukin. Stochastic Separation Theorems. Neural Networks, 94,
255-259, 2017.
• A.N. Gorban, R. Burton, I. Romanenko, I. Tyukin. One-Trial Correction of Legacy AI
Systems and Stochastic Separation Theorems. Information Sciences, 484, 237-254,
2019.
• A.N. Gorban, I. Tyukin. Blessing of dimensionality: mathematical foundations of the
statistical physics of data. Philosophical Transactions of the Royal Society A 376:
20170237, 2018.
• I. Tyukin, A.N. Gorban, C. Calvo, J. Makarova, V.A. Makarov. High-dimensional Brain.
A Tool for Encoding and Rapid Learning of Memories by Single Neurons. Bulletin of
Mathematical Biology, 2018, https://doi.org/10.1007/s11538-018-0415-5
• I. Tyukin, A.N. Gorban, K. Sofeikov, I. Romanenko. Knowledge Transfer Between
Artificial Intelligence Systems. Frontiers in Neurorobotics, 2018.
https://doi.org/10.3389/fnbot.2018.00049 .
• A.N. Gorban, A. Golubkov, B. Grechuk, E.M. Mirkes, I.Y. Tyukin. Correction of AI
systems by linear discriminants: Probabilistic foundations. Information Sciences,
466, 303-322, 2018.
• I. Tyukin, A.N. Gorban, S. Green, D. Prokhorov. Fast Construction of Correcting
Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study,
Information Sciences, 485, 230-247, 2019. 55
56. 56
Co-authors and collaborators
• Ivan Tyukin
• Jeremy Levesley
• Bogdan Grechuk
• Evgeny Mirkes
• Tatiana Tyukina
• Valeri Makarov
• Rosie Fenwick
• Eliyas Woldegeorgis
• Stephen Green
• Richard Burton
• Konstantin Sofeikov
• Sepehr Meshkinfamfard
• Ilya Romanenko
• Danil Prokhorov
• Jays Shields
• John Downie