Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Philosophical Aspects of Data
Modelling
Emir Muñoz
National University of Ireland Galway
Semantics of Object Represent...
2
3
Machine Learning
Field of study that gives computers the ability
to learn without being explicitly programmed
(Arthur Sa...
4
Text recognition Recommender Systems
Face detection Self-driving Cars
http://commons.wikimedia.org/
ML APPLICATIONS
5
INTRODUCTION
Philosopher Researcher/
Engineer
6
INTRODUCTION
Philosopher Researcher/
Engineer
Idealization
Abstraction
Latent variables
7
INTRODUCTION
Philosopher Researcher/
Engineer
New conceptual development
New insights into the source of knowledge
New a...
8
Regression Classification Clustering
STATISTICAL LEARNING
Continuous labels Discrete labels Densities
• Author’s proposal:
– Machine learning needs to be cultivated with the
vocabulary of philosophy to extend the range of
qu...
10
Duck?
Beaver?
Otter?
A Platypus
WHO CARES?
11
• «The foundations of pattern recognition can
be traced to Plato, later extended by Aristotle,
who distinguished betwee...
12
Training Data
Test Data
Machine Learning
Algorithm
Hypothesis Performance
Feedback
What is the justification
to use thi...
• “No free lunch” (The Supervised Learning No-Free-Lunch Theorems,
Wolpert, 2002)
13
Our model is a simplification of real...
14
• What is the justification to use this model and
object representation ?
Absolute performance Relative performance
Qua...
15
WHO CARES?
Mental disorders
Vs.
Normality
f(X)
16
WHO CARES?
Which one is better now?
I told you, we need to look beyond
the accuracy, consistency, and
relative performa...
17
WHO CARES?
Kernel Trick
Linear separation
With errors
Non-linear separation
No errors
Non-linear surface
corresponding ...
18
WHO CARES?
f(X)
Output prediction is not the main goal.
But a more extensive comprehension of the interactions between
...
19
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
“if A is true then ...
20
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
“if A is true then ...
21
INDUCTIVE INFERENCE
• Statistical learning (weaker than weak syllogism)
“if A is true then B is plausible;
B is true;
t...
22
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
1
2
3
induction
deductionobservations
Observing
facts
Explan...
23
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
Example linear discriminant
𝑔 𝒙 = 𝒘 𝑇
𝒙
x ∈ ℜ 𝒏
w ∈ ℜ 𝒏
Obse...
24
INDUCTIVE INFERENCE
Galilean Epistemology
(1564-1642)
Unlike heavenly bodies, the
mundane objects of the earth
were not...
25
INDUCTIVE INFERENCE
Linear AlgebraVector Space ModelFace Recognition
Example of abstraction
Example of idealization
Gal...
26
INDUCTIVE INFERENCE
Abstraction (a.k.a. Aristotelian idealization)
Idealization (a.k.a. Galilean idealization)
Given a ...
27
OBJECT REPRESENTATION IN MACHINE LEARNING
• Two main types of indeterminacy in
learning problems:
– Unknown nature of d...
• More problems: high degree of freedom in
the configuration of learning algorithms
28
OBJECT REPRESENTATION IN MACHINE LE...
29
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
30
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
Kernel Trick
𝑥1 = 𝑓1, 𝑓2, … , 𝑓𝑛
𝑥2 = 𝑓′1, 𝑓′2, … , 𝑓′ 𝑛
Let 𝑥 ...
31
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
“Abstraction does not necessarily cause
epistemic problems sinc...
32
OBJECT REPRESENTATION IN MACHINE LEARNING
• Idealization
It does not only act over the features but is
also realized du...
33
OBJECT REPRESENTATION IN MACHINE LEARNING
• Idealization
– (Weisberg, 2007) identifies 3 kinds of idealization used in
...
34
OBJECT REPRESENTATION IN MACHINE LEARNING
• Theoretical Variables
Theoretical term is the negation of observability,
i....
35
How old am I?
Latent Variables
Based on teeth.
• Count them. Kittens will have 26 deciduous teeth and adult cats will h...
• Multiple successful applications of Machine
Learning
– Not mainly rooted in our glorious technological
advancements
36
W...
37
WHAT IS NEXT?
First steps into the
relationship between
Philosophy and
Machine Learning
Which one is better now?
38
What real entity
corresponds this?
WHAT IS NEXT?
39
WHAT IS NEXT?
40
HOW THIS IS RELATED TO MY PHD
• RDF  method for conceptual description or
modelling of information
• Linked Data  met...
Upcoming SlideShare
Loading in …5
×

The Philosophical Aspects of Data Modelling

461 views

Published on

Reading Group 2015 at Insight Centre for Data Analytics Galway

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Philosophical Aspects of Data Modelling

  1. 1. The Philosophical Aspects of Data Modelling Emir Muñoz National University of Ireland Galway Semantics of Object Representation in Machine Learning Birkan Tunç Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA
  2. 2. 2
  3. 3. 3 Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959) https://www.informatik.uni-hamburg.de/ML/ Contribution Philosopher INTRODUCTION “ ”
  4. 4. 4 Text recognition Recommender Systems Face detection Self-driving Cars http://commons.wikimedia.org/ ML APPLICATIONS
  5. 5. 5 INTRODUCTION Philosopher Researcher/ Engineer
  6. 6. 6 INTRODUCTION Philosopher Researcher/ Engineer Idealization Abstraction Latent variables
  7. 7. 7 INTRODUCTION Philosopher Researcher/ Engineer New conceptual development New insights into the source of knowledge New aspects of the scientific methodology
  8. 8. 8 Regression Classification Clustering STATISTICAL LEARNING Continuous labels Discrete labels Densities
  9. 9. • Author’s proposal: – Machine learning needs to be cultivated with the vocabulary of philosophy to extend the range of questions that raised when evaluating various aspects of machine learning, pertaining to data representation 9 STATISTICAL LEARNING Real Entity - Nature - Structure 𝑋 → 𝑓(𝑋) Mathematical Object - Properties
  10. 10. 10 Duck? Beaver? Otter? A Platypus WHO CARES?
  11. 11. 11 • «The foundations of pattern recognition can be traced to Plato, later extended by Aristotle, who distinguished between an “essential property” […] from an “accidental property” […]» WHO CARES? Pattern recognition  find such essential properties
  12. 12. 12 Training Data Test Data Machine Learning Algorithm Hypothesis Performance Feedback What is the justification to use this model and object representation ? WHO CARES?
  13. 13. • “No free lunch” (The Supervised Learning No-Free-Lunch Theorems, Wolpert, 2002) 13 Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations “No one model works best for all possible situations.” WHO CARES?
  14. 14. 14 • What is the justification to use this model and object representation ? Absolute performance Relative performance Quantified by probabilistic bounds of the generalization error Compared to the relative algorithms and other configurations Examples: • Confusion matrix • Accuracy • Misclassification rate Examples: • Mahalanobis distance • Kolmogorov-Smirnov distance • ROC curves and AUC • Gini Need for philosophical attention WHO CARES? (Varieties of Justification in Machine Learning, Corfield, 2010)
  15. 15. 15 WHO CARES? Mental disorders Vs. Normality f(X)
  16. 16. 16 WHO CARES? Which one is better now? I told you, we need to look beyond the accuracy, consistency, and relative performance…
  17. 17. 17 WHO CARES? Kernel Trick Linear separation With errors Non-linear separation No errors Non-linear surface corresponding to a linear surface in the feature space We boost the performance of our model, regardless of the nonlinearity of original features
  18. 18. 18 WHO CARES? f(X) Output prediction is not the main goal. But a more extensive comprehension of the interactions between the main players of the system.
  19. 19. 19 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible”
  20. 20. 20 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible” Truth Preservation Truth Preservation
  21. 21. 21 INDUCTIVE INFERENCE • Statistical learning (weaker than weak syllogism) “if A is true then B is plausible; B is true; therefore A is plausible” Tools to evaluate the degree of plausibility that corresponds to our credence on the truth of conclusions
  22. 22. 22 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) 1 2 3 induction deductionobservations Observing facts Explanatory principles Explanation of the observations Simplification in object representation - Selecting primary/essential attributes - Avoiding the use of accidental attributes
  23. 23. 23 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) Example linear discriminant 𝑔 𝒙 = 𝒘 𝑇 𝒙 x ∈ ℜ 𝒏 w ∈ ℜ 𝒏 Observable Hyperplane Most objects of class A reside on the side of the hyperplane where 𝑔 𝒙 > 0.5 Definition of vector 𝒙, which needs feature extraction and selection “Most objects of class A reside on the side of the hyperplane where 𝑔(𝒙)>0.5; 𝑔(𝒙’)>0.5 is true for an object 𝒙’; therefore 𝒙’ is plausible of class A”
  24. 24. 24 INDUCTIVE INFERENCE Galilean Epistemology (1564-1642) Unlike heavenly bodies, the mundane objects of the earth were not suitable for mathematical models, as they did not manifest ideal behaviours. Abstraction Idealization representing an object with another object that is easier to handle simplifying properties of an object 3D space to deal with the motion of particles Frictionless surface of rocks falling
  25. 25. 25 INDUCTIVE INFERENCE Linear AlgebraVector Space ModelFace Recognition Example of abstraction Example of idealization Galilean idealization is pragmatic and aims to reduce computational limitations. E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.
  26. 26. 26 INDUCTIVE INFERENCE Abstraction (a.k.a. Aristotelian idealization) Idealization (a.k.a. Galilean idealization) Given a class of individuals, an idealization is a concept under which all of the individuals almost fall (in some pragmatically relevant sense), while at least one individual is excluded by the idealization Given a class of individuals, an abstraction is a concept under which all of the individuals fall.
  27. 27. 27 OBJECT REPRESENTATION IN MACHINE LEARNING • Two main types of indeterminacy in learning problems: – Unknown nature of data – Unknown functional form between input and corresponding outputs •  complicate the selection of hypothesis space, but also hinders the identification of essential attributes!!
  28. 28. • More problems: high degree of freedom in the configuration of learning algorithms 28 OBJECT REPRESENTATION IN MACHINE LEARNING Researchers play with the original feature space, for example using Principal Component Analysis (PCA). PCA is used for both: - Dimensionality reduction and; - Space transformation by identifying directions of maximum variance.
  29. 29. 29 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction
  30. 30. 30 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction Kernel Trick 𝑥1 = 𝑓1, 𝑓2, … , 𝑓𝑛 𝑥2 = 𝑓′1, 𝑓′2, … , 𝑓′ 𝑛 Let 𝑥 ∈ 𝑉, and a mapping 𝜙 𝑥 ∶ 𝑉 → 𝑊 Real objects 𝐾(𝑥1, 𝑥2) ≡ 𝜙 𝑥1 , 𝜙(𝑥2) The Kernel Trick (Rasmussen & Williams, 2005): - Enable us to work in very complex vector spaces without even knowing the mapping itself.
  31. 31. 31 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction “Abstraction does not necessarily cause epistemic problems since in most cases it is a necessary step to take.” “Without mathematical abstraction, it would not be possible to establish any foundation of statistical learning.” computational gains vs. representational issues
  32. 32. 32 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization It does not only act over the features but is also realized during the model construction. Remove irrelevant features to sort out the accidental attributes Remove irrelevant features to alleviate computational issues such as to reduce the dimensionality
  33. 33. 33 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization – (Weisberg, 2007) identifies 3 kinds of idealization used in scientific models Multi model idealization • Boosting, voting (ensemble methods) • Used when no single model can characterize the underlying causal structure • Small models with different set of features Galilean idealization • Performed against technical difficulties • Deliberate distortions • Bayesian learning model struggles with computational complexities without idealization Minimalist (Aristotelian) idealization • ‘stripping away’ all properties from a concrete object that we believe are not relevant to the problem at hand. • focus on a limited set of properties in isolation
  34. 34. 34 OBJECT REPRESENTATION IN MACHINE LEARNING • Theoretical Variables Theoretical term is the negation of observability, i.e. entities that cannot be perceived directly without aid of technical instruments or inferences This object is in cluster C Theoretical/latent variable is any variable not included in the unprocessed feature set Problematic in their semantics!! Does it refer to any real object or property? What is its meaning?
  35. 35. 35 How old am I? Latent Variables Based on teeth. • Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth. • Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth. http://www.wikihow.com/Know-Your-Cat%27s-Age Based on fur. • Like humans, cats will also develop grey hairs with age. Based on paws, claws, and pads. • As cats age, their nails will harden and become brittle and overgrown. Based on eyes. • Older cats will develop a cloudiness not present in kittens and younger cats, who have sharp, clear eyes. Based on behaviour. • Younger cats--like younger people--are generally more energetic and attracted to play. Hidden variables Not directly observed but inferred OBJECT REPRESENTATION IN MACHINE LEARNING
  36. 36. • Multiple successful applications of Machine Learning – Not mainly rooted in our glorious technological advancements 36 WHAT IS NEXT? Theory of kernels (Aronszajn, 1950) SVM first version (Vapnik & Lerner, 1963) Statistical learning (Vapnik & Chervoneskis, 1974) SVM final version (Cortes & Vapnik, 1995) 30 years!!!! Success associated with strong foundations, not with increasing size of the computer memory
  37. 37. 37 WHAT IS NEXT? First steps into the relationship between Philosophy and Machine Learning Which one is better now?
  38. 38. 38 What real entity corresponds this? WHAT IS NEXT?
  39. 39. 39 WHAT IS NEXT?
  40. 40. 40 HOW THIS IS RELATED TO MY PHD • RDF  method for conceptual description or modelling of information • Linked Data  method of publishing structured data • I want to apply ML techniques over Linked Data • What is the nature or structure of a Linked Data dataset? Thanks!

×