Modern statistics has been built on fatally flawed foundations of the failed logical positivist philosophy. This philosophy rejects unobservables as a basis for human knowledge. Since probability and causality are inherently unobservable, conventional statistics in inherently incapable of providing a satisfactory approach to these concepts. These slides announce a new approach to the discipline, which rejects the past century of developments based on a positivist approach. This online course (register via HTTP://bit.ly/ocRSRA ) will rebuild the entire discipline on a realist philosophy, creating a radically different methodology from the one currently in use all over the world.
Online Course: Real Statistics: A Radical Approach
1. Real Statistics:
A Radical Approach
Dr. Asad Zaman
Director, Uloom ul Umran (Islamic Alternative to Social Science), Al-Nafi Online Educational Platform : http://alnafi.com
Announcing: Free Online Course
Mixed Mode: Recorded + Live Lectures.
First Live Lecture: Sunday 26th June 2022: 9:00AM EST USA
SIGNUP for Course on: http://bit.ly/RSRA000
2. “How To Lie With Statistics”
Wikipedia:
https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics
In the 1960s and 1970s, it became a standard textbook
introduction to the subject of statistics for many college
students. It has become one of the best-selling statistics books
in history, with over one and a half million copies sold in the
English-language edition. It has also been widely translated.
This textbook has had larger sales than that of all other
statistics textbooks combined!!
WHY?
3. Three Major Defects in Modern Statistics
• Ontology: What exists
• Epistemology: What can we know about the
world (and what exists)
• Methodology: What are the correct methods
to acquire knowledge
All three were created in early 20th Century,
based on the philosophy of Logical Positivism.
This philosophy had a spectacular crash in mid
20th Century.
BUT: the foundations of statistics were not
revisited!!
Planned course rebuilds statistics by replacing
these foundations.
4. European Intellectual Developments
Europeans reconquer advanced but decaying
Islamic Spain Al-Andalus.
Translations of millions of books ends the dark
ages of Europe.
Influx of new knowledge creates a battle between
“science” and religion which lasts for two
centuries, and splits Christianity.
Bloody fratricidal battles between Christian
factions necessitate building knowledge on
secular grounds.
Key (false) Assumption: There exists secular
knowledge! That is, we can start from ZERO, and
build knowledge purely on the basis of
observations and logic.
5. Descartes: I think therefore I am!
• Reflection of Trauma of Loss of Faith, created
by religious wars
• What everyone believed with certainty for
centuries turned out to be wrong!
• How can we build knowledge which we can be
sure of?
• Start with ONTOLOGY: What exists?
• Argument is wrong. The word “I” already
presumes existence.
• Deductive arguments (reason) can NEVER
produce knowledge
• A => B means that B is derived from
knowledge contained in A.
6. Logical Positivism
Ontology: We can only be certain about existence
of what we can perceive with our five senses.
Epistemology: Knowledge comes solely from
observations and logic.
Crucial Question: How can we learn about the
future? How do we derive lessons from
experience?
Methodology: Knowledge comes from patterns in
the observations that we see. Understanding
comes from recognizing a pattern.
ALL of these principles are built into the
foundations of statistics.
ALL of these principles are WRONG!
7. LP makes it impossible to do
statistics!
Central Concepts of Statistics are unobservable:
• Probability: Correctly defined as being about what might have been.
• Causality: Necessary connection, not accidental correlation.
Neither is observable.
Modern Statistics has NO satisfactory definition of either of these
concepts.
Both frequency theory and Bayesian (subjective) probability are flawed
There is no way to deduce causality from data. Correlations are routinely
used to deduce causation, because there is no alternative.
8. Probability
• The coin came up heads. BUT, it could have
come up tails.
• Both of these events were equally likely
Neither of these ideas is observable. According to
LP, these sentences have no meaning, because
there is no way to verify their truth or falsity! BUT,
positivists have no solution to this problem.
• The frequency theory definition of probability
is wrong – it is based on an impossible
experiment, which cannot be used to define
anything.
• The Bayesian or subjectivist definition is also
wrong. My opinions about probability of
events may or may not reflect external reality.
9. Causality
• Observation: Smoking Increased, Paved Roadways
Increased, Industrial Production Increase, Lung
Cancer Increased.
• All graphs run in parallel
• Is there any causal relationship between any of the
variables?
• This question CANNOT be answered by data. Not
even by BIG data.
• Causality is created by hidden real-world mechanism.
• Only correlations can be observed in data.
• Logical Positivism makes it impossible to differentiate
between correlation and causation.
10. Lord Kelvin’s Dictum
“When you can measure what you
are speaking about, and express it in
numbers, you know something about
it; but when you cannot measure it,
when you cannot express it in
numbers, your knowledge is of a
meagre and unsatisfactory kind.”
Positivist Business: If you cannot
measure it, you cannot manage it (or
improve it).
11. Qualitative: Feelings,
Associations, Impressions
Quantitative: Temperature,
Volume, Shape Metrics,
Material Composition
Knowledge is only of accurate measurements.
Statistics is analysis of numbers, without
reference to where they come from
Positivist View of Statistics:
Concerned solely with numbers
12. Qualitative: Feelings,
Associations, Impressions
Quantitative: Temperature,
Volume, Shape Metrics,
Material Composition
Knowledge is mainly of hidden realities.
Observations, qualitative and quantitative
measures, all provide clues to hidden reality
Realist View:
Numbers are clues to hidden reality
13. Methodological Differences
• Data provides imperfect and
remote clues about the real
world.
• Knowledge is primarily about the
unobservable real world.
• Numbers cannot measure
qualitative phenomena
• Statistics deals with analysis of
data sets.
• Numerical data is the best and
most accurate type of
knowledge we can have about
the real world.
• Numbers don’t lie.
14. Concrete Example
Goal: Improve Quality of Research at XYZ
University.
Problem: Quality of research is unobservable,
unmeasurable, unquantifiable.
Solution: Discuss & achieve consensus on what is
high quality research. For example, it should be
practical, and solve social problems. Do holistic
end-to-end research – no theory/practice divide.
Recognize and reward solutions to real world
problems. Mostly not quantifiable.
Positivist Solution: Impact Factor Publications.
Leads to massive amounts of garbage publications
in garbage journals.
15. Fisher’s Contribution: Obstacle to Progress
Early 20th Century: Primitive Computation
Capabilities. Impossible to examine, display,
analyze large data sets – large meaning 100 or
more!
Fisher’s ingenious solution: IMAGINE that data
arises as a random sample from a parent
population which is characterized by a small
number of parameters.
Use data to estimate parameters of parent
population. These few numbers capture all the
information in the data: Sufficient Statistics.
Fisher defines statistics as “reduction of data”.
Recognizes that his methodology is due to lack of
computational capabilities.
16. Consequences of Fisherian
Methodology
There are (a few) cases where data is actually a random sample
from a parent population. In these cases, methodology is valid.
Vast majority of cases, this is not true. The assumption is being
made purely for computational convenience, with massively bad
results.
Standard Assumption: Data comes from Normal Distribution.
Then mean and standard deviation are sufficient statistics.
Imagined parent population (not data) completely characterized
by M and SD.
BUT, if there is no parent population, vastly distorted inference
results. In particular, M and SD are NOT good summary statistics.
17. Logical Positivism => Fisherian
Methodology
Data measures Observable Aspects of Hidden Reality
Data measures Observables – Hidden reality does not matter.
Measurements by themselves are an incoherent jumble. They
need to be organized, in order to be understood.
Treating data as a random sample from an unknown parent
population imposes a pattern on the data, and allows it to be
understood.
Data can have many different complex relationships with reality.
Our imaginary model CAN match reality, but rarely does. This is
not part of statistical training, to try to match models to reality.
Positivist
Knowledge
Real Knowledge
18. Nominalist Models not concerned with reality
Real Models attempt to match reality
We need to understand how data relates to
reality. We need to reconstruct hidden
reality from clues provided by data.
BOTH of these central projects of REAL
Statistics, are absent from IMAGINARY
Statistics.
Instead, we MAKE UP a model for the data
in our imagination. NEVER test this model
for a match to unobservable reality.
Real Statistics also requires a model for the
data. BUT this is a hypothesis about how
reality generates the data – the actual
relationship between hidden reality and the
observed data.
Hidden Reality
Observable
Real
Models
Imaginary Models
Match??
19. Massive Increase in
Computational Capabilties!
• Fisher himself understood the artificiality of the parametric distribution
assumption, and knew that this would need to be changed if and when
more computational capabilities developed.
• Followers took his methodology as a foundational principle, and continue
to follow it to this day – without realizing that the basis for the assumption
(computational convenience) is NO LONGER VALID.
• TODAY, we can DIRECTLY analyze MILLIONS of data points without ANY
assumptions about distributional form.
• The ENTIRE methodology of statistical inference building on Fisherian
foundations can be SCRAPPED!
20. Key Takeaway
Central Project of REAL Statistics:
How does data connect to hidden reality?
How can we reconstruct hidden real structures from the data?
How can we verify our hypotheses about hidden reality?
Positivist (Nominal) Statistics: Hidden reality does not matter!
How can we build a simple model for the data – model needs to match
the DATA, not the hidden reality!!
ANY model which is a good match will do – no concern for matching
hidden reality!
21. Real Statistics: A Radical Approach
Ontology, Epistemology, and Methodology of modern statistics have
been built on flawed foundations of logical positivist philosophy.
Replacing all three leads to a radically new approach. The focus shifts
from the appearances, as captured by data, to the hidden reality
partially revealed by the data. Inference concerns learning about this
hidden reality (not about the data!)
To register for this upcoming online course (first lecture on Sunday, 26th
June 2022 at 9:00AM EST USA, New York Time), fillup Google
Document http://bit.ly/RSRA000. Those who register will receive
access to course materials, and further instructions about course.
22. Links to Related Materials
Blog Posts which provide further
description of this course are:
Preface to Real Statistics:
http://bit.ly/PrefRSRA
Why an Islamic Approach to Statistics?:
http://bit.ly/WaIAtS
To register for the online course, starting on
Sunday 26th June, sign up on Google Form:
http://bit.ly/RSRA000