The Philosophical Aspects of Data Modelling

The Philosophical Aspects of Data
Modelling
Emir Muñoz
National University of Ireland Galway
Semantics of Object Representation in Machine
Learning
Birkan Tunç
Center for Biomedical Image Computing and Analytics,
University of Pennsylvania, Philadelphia, PA, USA

3
Machine Learning
Field of study that gives computers the ability
to learn without being explicitly programmed
(Arthur Samuel, 1959)
https://www.informatik.uni-hamburg.de/ML/
Contribution
Philosopher
INTRODUCTION
“ ”

4
Text recognition Recommender Systems
Face detection Self-driving Cars
http://commons.wikimedia.org/
ML APPLICATIONS

5
INTRODUCTION
Philosopher Researcher/
Engineer

6
INTRODUCTION
Engineer
Idealization
Abstraction
Latent variables

7
INTRODUCTION
Engineer
New conceptual development
New insights into the source of knowledge
New aspects of the scientific methodology

8
Regression Classification Clustering
STATISTICAL LEARNING
Continuous labels Discrete labels Densities

• Author’s proposal:
– Machine learning needs to be cultivated with the
vocabulary of philosophy to extend the range of
questions that raised when evaluating various
aspects of machine learning, pertaining to data
representation
9
STATISTICAL LEARNING
Real Entity
- Nature
- Structure
𝑋 → 𝑓(𝑋)
Mathematical Object
- Properties

10
Duck?
Beaver?
Otter?
A Platypus
WHO CARES?

11
• «The foundations of pattern recognition can
be traced to Plato, later extended by Aristotle,
who distinguished between an “essential
property” […] from an “accidental property”
[…]»
WHO CARES?
Pattern recognition  find such essential properties

12
Training Data
Test Data
Machine Learning
Algorithm
Hypothesis Performance
Feedback
What is the justification
to use this model and object
representation ?
WHO CARES?

• “No free lunch” (The Supervised Learning No-Free-Lunch Theorems,
Wolpert, 2002)
13
Our model is a simplification of reality
Simplification is based on assumptions (model bias)
Assumptions fail in certain situations
“No one model works best for all possible situations.”
WHO CARES?

14
• What is the justification to use this model and
object representation ?
Absolute performance Relative performance
Quantified by probabilistic bounds
of the generalization error
Compared to the relative
algorithms and other configurations
Examples:
• Confusion matrix
• Accuracy
• Misclassification rate
Examples:
• Mahalanobis distance
• Kolmogorov-Smirnov distance
• ROC curves and AUC
• Gini
Need for philosophical attention
WHO CARES?
(Varieties of Justification in Machine Learning, Corfield, 2010)

15
WHO CARES?
Mental disorders
Vs.
Normality
f(X)

16
WHO CARES?
Which one is better now?
I told you, we need to look beyond
the accuracy, consistency, and
relative performance…

17
WHO CARES?
Kernel Trick
Linear separation
With errors
Non-linear separation
No errors
Non-linear surface
corresponding to a linear
surface in the feature space
We boost the performance of our
model, regardless of the nonlinearity
of original features

18
WHO CARES?
f(X)
Output prediction is not the main goal.
But a more extensive comprehension of the interactions between
the main players of the system.

19
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
“if A is true then B is true;
A is true;
therefore B is true”
B is true;
therefore A is plausible”

20
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
A is true;
therefore B is true”
B is true;
Truth
Preservation
Truth
Preservation

21
INDUCTIVE INFERENCE
• Statistical learning (weaker than weak syllogism)
“if A is true then B is plausible;
B is true;
Tools to evaluate the degree of
plausibility that corresponds to our
credence on the truth of conclusions

22
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
1
2
3
induction
deductionobservations
Observing
facts
Explanatory
principles
Explanation
of the
observations
Simplification in object representation
- Selecting primary/essential attributes
- Avoiding the use of accidental attributes

23
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
Example linear discriminant
𝑔 𝒙 = 𝒘 𝑇
𝒙
x ∈ ℜ 𝒏
w ∈ ℜ 𝒏
Observable
Hyperplane
Most objects of class A reside on the side of the
hyperplane where 𝑔 𝒙 > 0.5
Definition of vector 𝒙, which needs feature extraction and selection
“Most objects of class A reside on the side of the hyperplane
where 𝑔(𝒙)>0.5; 𝑔(𝒙’)>0.5 is true for an object 𝒙’;
therefore 𝒙’ is plausible of class A”

24
INDUCTIVE INFERENCE
Galilean Epistemology
(1564-1642)
Unlike heavenly bodies, the
mundane objects of the earth
were not suitable for
mathematical models, as they did
not manifest ideal behaviours.
Abstraction Idealization
representing an object with
another object that is easier to
handle
simplifying properties of an
object
3D space to deal
with the motion
of particles
Frictionless
surface
of rocks falling

25
INDUCTIVE INFERENCE
Linear AlgebraVector Space ModelFace Recognition
Example of abstraction
Example of idealization
Galilean idealization is pragmatic and aims to reduce computational limitations.
E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.

26
INDUCTIVE INFERENCE
Abstraction (a.k.a. Aristotelian idealization)
Idealization (a.k.a. Galilean idealization)
Given a class of individuals, an idealization is a concept
under which all of the individuals almost fall (in some
pragmatically relevant sense), while at least one individual
is excluded by the idealization
Given a class of individuals, an abstraction is a concept
under which all of the individuals fall.

27
OBJECT REPRESENTATION IN MACHINE LEARNING
• Two main types of indeterminacy in
learning problems:
– Unknown nature of data
– Unknown functional form between input and
corresponding outputs
•  complicate the selection of hypothesis
space, but also hinders the identification of
essential attributes!!

• More problems: high degree of freedom in
the configuration of learning algorithms
28
Researchers play with the original feature
space, for example using Principal
Component Analysis (PCA).
PCA is used for both:
- Dimensionality reduction and;
- Space transformation by identifying
directions of maximum variance.

29
• Abstraction

30
• Abstraction
Kernel Trick
𝑥1 = 𝑓1, 𝑓2, … , 𝑓𝑛
𝑥2 = 𝑓′1, 𝑓′2, … , 𝑓′ 𝑛
Let 𝑥 ∈ 𝑉, and a mapping 𝜙 𝑥 ∶ 𝑉 → 𝑊
Real objects
𝐾(𝑥1, 𝑥2) ≡ 𝜙 𝑥1 , 𝜙(𝑥2)
The Kernel Trick (Rasmussen
& Williams, 2005):
- Enable us to work in very
complex vector spaces
without even knowing the
mapping itself.

31
• Abstraction
“Abstraction does not necessarily cause
epistemic problems since in most cases
it is a necessary step to take.”
“Without mathematical abstraction, it
would not be possible to establish any
foundation of statistical learning.” computational gains
vs.
representational issues

32
• Idealization
It does not only act over the features but is
also realized during the model construction.
Remove irrelevant features to sort out
the accidental attributes
Remove irrelevant features to alleviate
computational issues such as to reduce
the dimensionality

33
• Idealization
– (Weisberg, 2007) identifies 3 kinds of idealization used in
scientific models
Multi model
idealization
• Boosting, voting
(ensemble methods)
• Used when no single
model can characterize
the underlying causal
structure
• Small models with
different set of
features
Galilean idealization
• Performed against
technical difficulties
• Deliberate distortions
• Bayesian learning
model struggles with
computational
complexities without
idealization
Minimalist
(Aristotelian)
idealization
• ‘stripping away’ all
properties from a
concrete object that
we believe are not
relevant to the
problem at hand.
• focus on a limited set
of properties in
isolation

34
• Theoretical Variables
Theoretical term is the negation of observability,
i.e. entities that cannot be perceived directly
without aid of technical instruments or inferences
This object is in cluster C
Theoretical/latent variable is
any variable not included in
the unprocessed feature set
Problematic in their semantics!!
Does it refer to any real object or property?
What is its meaning?

35
How old am I?
Latent Variables
Based on teeth.
• Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth.
• Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth.
http://www.wikihow.com/Know-Your-Cat%27s-Age
Based on fur.
• Like humans, cats will also develop grey hairs with age.
Based on paws, claws, and pads.
• As cats age, their nails will harden and become brittle and overgrown.
Based on eyes.
• Older cats will develop a cloudiness not present in kittens and younger cats, who
have sharp, clear eyes.
Based on behaviour.
• Younger cats--like younger people--are generally more energetic and attracted to play.
Hidden variables
Not directly observed but inferred

• Multiple successful applications of Machine
Learning
– Not mainly rooted in our glorious technological
advancements
36
WHAT IS NEXT?
Theory of
kernels
(Aronszajn,
1950)
SVM first
version
(Vapnik &
Lerner,
1963)
Statistical
learning
(Vapnik &
Chervoneskis,
1974)
SVM final
version
(Cortes &
Vapnik,
1995)
30 years!!!!
Success associated
with strong
foundations, not with
increasing size of
the computer memory

37
WHAT IS NEXT?
First steps into the
relationship between
Philosophy and
Machine Learning
Which one is better now?

38
What real entity
corresponds this?
WHAT IS NEXT?

40
HOW THIS IS RELATED TO MY PHD
• RDF  method for conceptual description or
modelling of information
• Linked Data  method of publishing structured
data
• I want to apply ML techniques over Linked Data
• What is the nature or structure of a Linked Data
dataset?
Thanks!

The Philosophical Aspects of Data Modelling

Recommended

Recommended

More Related Content

Similar to The Philosophical Aspects of Data Modelling

Similar to The Philosophical Aspects of Data Modelling (20)

More from Emir Muñoz

More from Emir Muñoz (11)

Recently uploaded

Recently uploaded (20)

The Philosophical Aspects of Data Modelling