SlideShare a Scribd company logo
1 of 40
Download to read offline
The Philosophical Aspects of Data
Modelling
Emir Muñoz
National University of Ireland Galway
Semantics of Object Representation in Machine
Learning
Birkan Tunç
Center for Biomedical Image Computing and Analytics,
University of Pennsylvania, Philadelphia, PA, USA
2
3
Machine Learning
Field of study that gives computers the ability
to learn without being explicitly programmed
(Arthur Samuel, 1959)
https://www.informatik.uni-hamburg.de/ML/
Contribution
Philosopher
INTRODUCTION
“ ”
4
Text recognition Recommender Systems
Face detection Self-driving Cars
http://commons.wikimedia.org/
ML APPLICATIONS
5
INTRODUCTION
Philosopher Researcher/
Engineer
6
INTRODUCTION
Philosopher Researcher/
Engineer
Idealization
Abstraction
Latent variables
7
INTRODUCTION
Philosopher Researcher/
Engineer
New conceptual development
New insights into the source of knowledge
New aspects of the scientific methodology
8
Regression Classification Clustering
STATISTICAL LEARNING
Continuous labels Discrete labels Densities
• Author’s proposal:
– Machine learning needs to be cultivated with the
vocabulary of philosophy to extend the range of
questions that raised when evaluating various
aspects of machine learning, pertaining to data
representation
9
STATISTICAL LEARNING
Real Entity
- Nature
- Structure
𝑋 → 𝑓(𝑋)
Mathematical Object
- Properties
10
Duck?
Beaver?
Otter?
A Platypus
WHO CARES?
11
• «The foundations of pattern recognition can
be traced to Plato, later extended by Aristotle,
who distinguished between an “essential
property” […] from an “accidental property”
[…]»
WHO CARES?
Pattern recognition  find such essential properties
12
Training Data
Test Data
Machine Learning
Algorithm
Hypothesis Performance
Feedback
What is the justification
to use this model and object
representation ?
WHO CARES?
• “No free lunch” (The Supervised Learning No-Free-Lunch Theorems,
Wolpert, 2002)
13
Our model is a simplification of reality
Simplification is based on assumptions (model bias)
Assumptions fail in certain situations
“No one model works best for all possible situations.”
WHO CARES?
14
• What is the justification to use this model and
object representation ?
Absolute performance Relative performance
Quantified by probabilistic bounds
of the generalization error
Compared to the relative
algorithms and other configurations
Examples:
• Confusion matrix
• Accuracy
• Misclassification rate
Examples:
• Mahalanobis distance
• Kolmogorov-Smirnov distance
• ROC curves and AUC
• Gini
Need for philosophical attention
WHO CARES?
(Varieties of Justification in Machine Learning, Corfield, 2010)
15
WHO CARES?
Mental disorders
Vs.
Normality
f(X)
16
WHO CARES?
Which one is better now?
I told you, we need to look beyond
the accuracy, consistency, and
relative performance…
17
WHO CARES?
Kernel Trick
Linear separation
With errors
Non-linear separation
No errors
Non-linear surface
corresponding to a linear
surface in the feature space
We boost the performance of our
model, regardless of the nonlinearity
of original features
18
WHO CARES?
f(X)
Output prediction is not the main goal.
But a more extensive comprehension of the interactions between
the main players of the system.
19
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
“if A is true then B is true;
A is true;
therefore B is true”
“if A is true then B is true;
B is true;
therefore A is plausible”
20
INDUCTIVE INFERENCE
• Deductive reasoning (strong syllogism)
• Inductive inference (weak syllogism)
“if A is true then B is true;
A is true;
therefore B is true”
“if A is true then B is true;
B is true;
therefore A is plausible”
Truth
Preservation
Truth
Preservation
21
INDUCTIVE INFERENCE
• Statistical learning (weaker than weak syllogism)
“if A is true then B is plausible;
B is true;
therefore A is plausible”
Tools to evaluate the degree of
plausibility that corresponds to our
credence on the truth of conclusions
22
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
1
2
3
induction
deductionobservations
Observing
facts
Explanatory
principles
Explanation
of the
observations
Simplification in object representation
- Selecting primary/essential attributes
- Avoiding the use of accidental attributes
23
INDUCTIVE INFERENCE
Aristotelian Epistemology
(384-322 BC)
Example linear discriminant
𝑔 𝒙 = 𝒘 𝑇
𝒙
x ∈ ℜ 𝒏
w ∈ ℜ 𝒏
Observable
Hyperplane
Most objects of class A reside on the side of the
hyperplane where 𝑔 𝒙 > 0.5
Definition of vector 𝒙, which needs feature extraction and selection
“Most objects of class A reside on the side of the hyperplane
where 𝑔(𝒙)>0.5; 𝑔(𝒙’)>0.5 is true for an object 𝒙’;
therefore 𝒙’ is plausible of class A”
24
INDUCTIVE INFERENCE
Galilean Epistemology
(1564-1642)
Unlike heavenly bodies, the
mundane objects of the earth
were not suitable for
mathematical models, as they did
not manifest ideal behaviours.
Abstraction Idealization
representing an object with
another object that is easier to
handle
simplifying properties of an
object
3D space to deal
with the motion
of particles
Frictionless
surface
of rocks falling
25
INDUCTIVE INFERENCE
Linear AlgebraVector Space ModelFace Recognition
Example of abstraction
Example of idealization
Galilean idealization is pragmatic and aims to reduce computational limitations.
E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.
26
INDUCTIVE INFERENCE
Abstraction (a.k.a. Aristotelian idealization)
Idealization (a.k.a. Galilean idealization)
Given a class of individuals, an idealization is a concept
under which all of the individuals almost fall (in some
pragmatically relevant sense), while at least one individual
is excluded by the idealization
Given a class of individuals, an abstraction is a concept
under which all of the individuals fall.
27
OBJECT REPRESENTATION IN MACHINE LEARNING
• Two main types of indeterminacy in
learning problems:
– Unknown nature of data
– Unknown functional form between input and
corresponding outputs
•  complicate the selection of hypothesis
space, but also hinders the identification of
essential attributes!!
• More problems: high degree of freedom in
the configuration of learning algorithms
28
OBJECT REPRESENTATION IN MACHINE LEARNING
Researchers play with the original feature
space, for example using Principal
Component Analysis (PCA).
PCA is used for both:
- Dimensionality reduction and;
- Space transformation by identifying
directions of maximum variance.
29
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
30
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
Kernel Trick
𝑥1 = 𝑓1, 𝑓2, … , 𝑓𝑛
𝑥2 = 𝑓′1, 𝑓′2, … , 𝑓′ 𝑛
Let 𝑥 ∈ 𝑉, and a mapping 𝜙 𝑥 ∶ 𝑉 → 𝑊
Real objects
𝐾(𝑥1, 𝑥2) ≡ 𝜙 𝑥1 , 𝜙(𝑥2)
The Kernel Trick (Rasmussen
& Williams, 2005):
- Enable us to work in very
complex vector spaces
without even knowing the
mapping itself.
31
OBJECT REPRESENTATION IN MACHINE LEARNING
• Abstraction
“Abstraction does not necessarily cause
epistemic problems since in most cases
it is a necessary step to take.”
“Without mathematical abstraction, it
would not be possible to establish any
foundation of statistical learning.” computational gains
vs.
representational issues
32
OBJECT REPRESENTATION IN MACHINE LEARNING
• Idealization
It does not only act over the features but is
also realized during the model construction.
Remove irrelevant features to sort out
the accidental attributes
Remove irrelevant features to alleviate
computational issues such as to reduce
the dimensionality
33
OBJECT REPRESENTATION IN MACHINE LEARNING
• Idealization
– (Weisberg, 2007) identifies 3 kinds of idealization used in
scientific models
Multi model
idealization
• Boosting, voting
(ensemble methods)
• Used when no single
model can characterize
the underlying causal
structure
• Small models with
different set of
features
Galilean idealization
• Performed against
technical difficulties
• Deliberate distortions
• Bayesian learning
model struggles with
computational
complexities without
idealization
Minimalist
(Aristotelian)
idealization
• ‘stripping away’ all
properties from a
concrete object that
we believe are not
relevant to the
problem at hand.
• focus on a limited set
of properties in
isolation
34
OBJECT REPRESENTATION IN MACHINE LEARNING
• Theoretical Variables
Theoretical term is the negation of observability,
i.e. entities that cannot be perceived directly
without aid of technical instruments or inferences
This object is in cluster C
Theoretical/latent variable is
any variable not included in
the unprocessed feature set
Problematic in their semantics!!
Does it refer to any real object or property?
What is its meaning?
35
How old am I?
Latent Variables
Based on teeth.
• Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth.
• Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth.
http://www.wikihow.com/Know-Your-Cat%27s-Age
Based on fur.
• Like humans, cats will also develop grey hairs with age.
Based on paws, claws, and pads.
• As cats age, their nails will harden and become brittle and overgrown.
Based on eyes.
• Older cats will develop a cloudiness not present in kittens and younger cats, who
have sharp, clear eyes.
Based on behaviour.
• Younger cats--like younger people--are generally more energetic and attracted to play.
Hidden variables
Not directly observed but inferred
OBJECT REPRESENTATION IN MACHINE LEARNING
• Multiple successful applications of Machine
Learning
– Not mainly rooted in our glorious technological
advancements
36
WHAT IS NEXT?
Theory of
kernels
(Aronszajn,
1950)
SVM first
version
(Vapnik &
Lerner,
1963)
Statistical
learning
(Vapnik &
Chervoneskis,
1974)
SVM final
version
(Cortes &
Vapnik,
1995)
30 years!!!!
Success associated
with strong
foundations, not with
increasing size of
the computer memory
37
WHAT IS NEXT?
First steps into the
relationship between
Philosophy and
Machine Learning
Which one is better now?
38
What real entity
corresponds this?
WHAT IS NEXT?
39
WHAT IS NEXT?
40
HOW THIS IS RELATED TO MY PHD
• RDF  method for conceptual description or
modelling of information
• Linked Data  method of publishing structured
data
• I want to apply ML techniques over Linked Data
• What is the nature or structure of a Linked Data
dataset?
Thanks!

More Related Content

Similar to The Philosophical Aspects of Data Modelling

Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022
Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022
Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022Antonio Lieto
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - Lieto
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - LietoCognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - Lieto
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - LietoAntonio Lieto
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflowsBalázs Kégl
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfGayathriRHICETCSESTA
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfGayathriRHICETCSESTA
 
01_Artificial_Intelligence-Introduction.ppt
01_Artificial_Intelligence-Introduction.ppt01_Artificial_Intelligence-Introduction.ppt
01_Artificial_Intelligence-Introduction.pptMemMem25
 
Hybrid Intelligence
Hybrid IntelligenceHybrid Intelligence
Hybrid IntelligenceFabio Casati
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agentsmahutte
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its applicationFELICIALILIANJ
 
Agent-Based Modelling: Social Science Meets Computer Science?
Agent-Based Modelling: Social Science Meets Computer Science?Agent-Based Modelling: Social Science Meets Computer Science?
Agent-Based Modelling: Social Science Meets Computer Science?Edmund Chattoe-Brown
 
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...Minjoon Kim
 

Similar to The Philosophical Aspects of Data Modelling (20)

Apmp brazil oct 2017
Apmp brazil oct 2017Apmp brazil oct 2017
Apmp brazil oct 2017
 
Apmp brazil oct 2017
Apmp brazil oct 2017Apmp brazil oct 2017
Apmp brazil oct 2017
 
Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022
Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022
Invited Tutorial - Cognitive Design for Artificial Minds AI*IA 2022
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - Lieto
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - LietoCognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - Lieto
Cognitive Paradigm in AI - Invited Lecture - Kyiv/Kyev - Lieto
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
 
01_Artificial_Intelligence-Introduction.ppt
01_Artificial_Intelligence-Introduction.ppt01_Artificial_Intelligence-Introduction.ppt
01_Artificial_Intelligence-Introduction.ppt
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Hybrid Intelligence
Hybrid IntelligenceHybrid Intelligence
Hybrid Intelligence
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agents
 
The Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System ArchitectureThe Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System Architecture
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its application
 
Agent-Based Modelling: Social Science Meets Computer Science?
Agent-Based Modelling: Social Science Meets Computer Science?Agent-Based Modelling: Social Science Meets Computer Science?
Agent-Based Modelling: Social Science Meets Computer Science?
 
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
 

More from Emir Muñoz

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesEmir Muñoz
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010Emir Muñoz
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsEmir Muñoz
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked DataEmir Muñoz
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónEmir Muñoz
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesEmir Muñoz
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014Emir Muñoz
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataEmir Muñoz
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesEmir Muñoz
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI TalkEmir Muñoz
 

More from Emir Muñoz (11)

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elements
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked Data
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From Wikitables
 
DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI Talk
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

The Philosophical Aspects of Data Modelling

  • 1. The Philosophical Aspects of Data Modelling Emir Muñoz National University of Ireland Galway Semantics of Object Representation in Machine Learning Birkan Tunç Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA
  • 2. 2
  • 3. 3 Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959) https://www.informatik.uni-hamburg.de/ML/ Contribution Philosopher INTRODUCTION “ ”
  • 4. 4 Text recognition Recommender Systems Face detection Self-driving Cars http://commons.wikimedia.org/ ML APPLICATIONS
  • 7. 7 INTRODUCTION Philosopher Researcher/ Engineer New conceptual development New insights into the source of knowledge New aspects of the scientific methodology
  • 8. 8 Regression Classification Clustering STATISTICAL LEARNING Continuous labels Discrete labels Densities
  • 9. • Author’s proposal: – Machine learning needs to be cultivated with the vocabulary of philosophy to extend the range of questions that raised when evaluating various aspects of machine learning, pertaining to data representation 9 STATISTICAL LEARNING Real Entity - Nature - Structure 𝑋 → 𝑓(𝑋) Mathematical Object - Properties
  • 11. 11 • «The foundations of pattern recognition can be traced to Plato, later extended by Aristotle, who distinguished between an “essential property” […] from an “accidental property” […]» WHO CARES? Pattern recognition  find such essential properties
  • 12. 12 Training Data Test Data Machine Learning Algorithm Hypothesis Performance Feedback What is the justification to use this model and object representation ? WHO CARES?
  • 13. • “No free lunch” (The Supervised Learning No-Free-Lunch Theorems, Wolpert, 2002) 13 Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations “No one model works best for all possible situations.” WHO CARES?
  • 14. 14 • What is the justification to use this model and object representation ? Absolute performance Relative performance Quantified by probabilistic bounds of the generalization error Compared to the relative algorithms and other configurations Examples: • Confusion matrix • Accuracy • Misclassification rate Examples: • Mahalanobis distance • Kolmogorov-Smirnov distance • ROC curves and AUC • Gini Need for philosophical attention WHO CARES? (Varieties of Justification in Machine Learning, Corfield, 2010)
  • 16. 16 WHO CARES? Which one is better now? I told you, we need to look beyond the accuracy, consistency, and relative performance…
  • 17. 17 WHO CARES? Kernel Trick Linear separation With errors Non-linear separation No errors Non-linear surface corresponding to a linear surface in the feature space We boost the performance of our model, regardless of the nonlinearity of original features
  • 18. 18 WHO CARES? f(X) Output prediction is not the main goal. But a more extensive comprehension of the interactions between the main players of the system.
  • 19. 19 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible”
  • 20. 20 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible” Truth Preservation Truth Preservation
  • 21. 21 INDUCTIVE INFERENCE • Statistical learning (weaker than weak syllogism) “if A is true then B is plausible; B is true; therefore A is plausible” Tools to evaluate the degree of plausibility that corresponds to our credence on the truth of conclusions
  • 22. 22 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) 1 2 3 induction deductionobservations Observing facts Explanatory principles Explanation of the observations Simplification in object representation - Selecting primary/essential attributes - Avoiding the use of accidental attributes
  • 23. 23 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) Example linear discriminant 𝑔 𝒙 = 𝒘 𝑇 𝒙 x ∈ ℜ 𝒏 w ∈ ℜ 𝒏 Observable Hyperplane Most objects of class A reside on the side of the hyperplane where 𝑔 𝒙 > 0.5 Definition of vector 𝒙, which needs feature extraction and selection “Most objects of class A reside on the side of the hyperplane where 𝑔(𝒙)>0.5; 𝑔(𝒙’)>0.5 is true for an object 𝒙’; therefore 𝒙’ is plausible of class A”
  • 24. 24 INDUCTIVE INFERENCE Galilean Epistemology (1564-1642) Unlike heavenly bodies, the mundane objects of the earth were not suitable for mathematical models, as they did not manifest ideal behaviours. Abstraction Idealization representing an object with another object that is easier to handle simplifying properties of an object 3D space to deal with the motion of particles Frictionless surface of rocks falling
  • 25. 25 INDUCTIVE INFERENCE Linear AlgebraVector Space ModelFace Recognition Example of abstraction Example of idealization Galilean idealization is pragmatic and aims to reduce computational limitations. E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.
  • 26. 26 INDUCTIVE INFERENCE Abstraction (a.k.a. Aristotelian idealization) Idealization (a.k.a. Galilean idealization) Given a class of individuals, an idealization is a concept under which all of the individuals almost fall (in some pragmatically relevant sense), while at least one individual is excluded by the idealization Given a class of individuals, an abstraction is a concept under which all of the individuals fall.
  • 27. 27 OBJECT REPRESENTATION IN MACHINE LEARNING • Two main types of indeterminacy in learning problems: – Unknown nature of data – Unknown functional form between input and corresponding outputs •  complicate the selection of hypothesis space, but also hinders the identification of essential attributes!!
  • 28. • More problems: high degree of freedom in the configuration of learning algorithms 28 OBJECT REPRESENTATION IN MACHINE LEARNING Researchers play with the original feature space, for example using Principal Component Analysis (PCA). PCA is used for both: - Dimensionality reduction and; - Space transformation by identifying directions of maximum variance.
  • 29. 29 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction
  • 30. 30 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction Kernel Trick 𝑥1 = 𝑓1, 𝑓2, … , 𝑓𝑛 𝑥2 = 𝑓′1, 𝑓′2, … , 𝑓′ 𝑛 Let 𝑥 ∈ 𝑉, and a mapping 𝜙 𝑥 ∶ 𝑉 → 𝑊 Real objects 𝐾(𝑥1, 𝑥2) ≡ 𝜙 𝑥1 , 𝜙(𝑥2) The Kernel Trick (Rasmussen & Williams, 2005): - Enable us to work in very complex vector spaces without even knowing the mapping itself.
  • 31. 31 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction “Abstraction does not necessarily cause epistemic problems since in most cases it is a necessary step to take.” “Without mathematical abstraction, it would not be possible to establish any foundation of statistical learning.” computational gains vs. representational issues
  • 32. 32 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization It does not only act over the features but is also realized during the model construction. Remove irrelevant features to sort out the accidental attributes Remove irrelevant features to alleviate computational issues such as to reduce the dimensionality
  • 33. 33 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization – (Weisberg, 2007) identifies 3 kinds of idealization used in scientific models Multi model idealization • Boosting, voting (ensemble methods) • Used when no single model can characterize the underlying causal structure • Small models with different set of features Galilean idealization • Performed against technical difficulties • Deliberate distortions • Bayesian learning model struggles with computational complexities without idealization Minimalist (Aristotelian) idealization • ‘stripping away’ all properties from a concrete object that we believe are not relevant to the problem at hand. • focus on a limited set of properties in isolation
  • 34. 34 OBJECT REPRESENTATION IN MACHINE LEARNING • Theoretical Variables Theoretical term is the negation of observability, i.e. entities that cannot be perceived directly without aid of technical instruments or inferences This object is in cluster C Theoretical/latent variable is any variable not included in the unprocessed feature set Problematic in their semantics!! Does it refer to any real object or property? What is its meaning?
  • 35. 35 How old am I? Latent Variables Based on teeth. • Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth. • Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth. http://www.wikihow.com/Know-Your-Cat%27s-Age Based on fur. • Like humans, cats will also develop grey hairs with age. Based on paws, claws, and pads. • As cats age, their nails will harden and become brittle and overgrown. Based on eyes. • Older cats will develop a cloudiness not present in kittens and younger cats, who have sharp, clear eyes. Based on behaviour. • Younger cats--like younger people--are generally more energetic and attracted to play. Hidden variables Not directly observed but inferred OBJECT REPRESENTATION IN MACHINE LEARNING
  • 36. • Multiple successful applications of Machine Learning – Not mainly rooted in our glorious technological advancements 36 WHAT IS NEXT? Theory of kernels (Aronszajn, 1950) SVM first version (Vapnik & Lerner, 1963) Statistical learning (Vapnik & Chervoneskis, 1974) SVM final version (Cortes & Vapnik, 1995) 30 years!!!! Success associated with strong foundations, not with increasing size of the computer memory
  • 37. 37 WHAT IS NEXT? First steps into the relationship between Philosophy and Machine Learning Which one is better now?
  • 38. 38 What real entity corresponds this? WHAT IS NEXT?
  • 40. 40 HOW THIS IS RELATED TO MY PHD • RDF  method for conceptual description or modelling of information • Linked Data  method of publishing structured data • I want to apply ML techniques over Linked Data • What is the nature or structure of a Linked Data dataset? Thanks!