Introduction to the ideas involved in statistical thinking. We hear statistics every day related to political polls, the stock market and the economy, sports, and lots of other life where metrics of performance are useful in interpreting what's going on.
2. There are three kinds of lies: lies,
damned lies, and statistics.
3. H.G. Wells
• Statistical thinking will
one day be as necessary
for efficient citizenship
as the ability to read
and write.
4. Artemus Ward
• It ain’t so much the
things we don’t know
that get us in trouble.
It’s the things we know
that just ain’t so.
Charles Farrar Browne
• This could be
considered unknown
knowns.
5. Evidence and Reality
• As we know, there are
known knowns, there are
things we know we know.
We also know there are
known unknowns, that is
to say we know there are
some things we do not
know. But there are also
unknown unknowns - the
ones we don’t know we
don’t know.
Donald Rumsfeld
6. Epistemology
• The branch of
philosophy concerned
with the nature and
scope of knowledge and
what distinguishes
justified belief from
opinion.
8. Sir Francis Galton
• I have a great subject
(statistics) to write
upon, but feel keenly
my literary incapacity to
make it easily
intelligible without
sacrificing accuracy and
thoroughness.
9. Sir Francis Galton
Cousin of Charles Darwin
English polymath: anthropologist, eugenicist, tropical explorer, geographer, inventor,
meteorologist, proto-geneticist, and statistician. He was knighted in 1909.
Galton produced over 340 papers and books. He also created the statistical concept
of correlation and widely promoted regression toward the mean. He was the first to apply
statistical methods to the study of human differences and inheritance of intelligence, and
introduced the use of questionnaires and surveys .
He was a pioneer in eugenics, coining the term itself and the phrase "nature versus nurture".
His book Hereditary Genius (1869) was the first social scientific attempt to
study genius and greatness.
As an investigator of the human mind, he founded psychometrics (the science of measuring
mental faculties). He devised a method for classifying fingerprints that proved useful
in forensic science.
As the initiator of scientific meteorology, he devised the first weather map, and was the first
to establish a complete record of short-term climatic phenomena on a European scale. He
also invented the Galton Whistle for testing differential hearing ability.
10. Statistics
Is the study of the collection, organization,
analysis, interpretation and presentation
of data. It deals with all aspects of data,
including the planning of data collection in
terms of the design of
surveys and experiments.
11. Statistics is closely related
to probability theory
• probability theory starts
from the given
parameters of a total
population
to deduce probabilities
that pertain to samples.
• Statistical inference,
however, moves in the
opposite direction—
inductively
inferring from samples
to the parameters of a
larger or total
population.
12. Reasoning
Deductive
• http://www.youtube.com/watch?v
=ZTfVIMPV8KY
• Deductive reasoning, or,
informally, "top-down" logic, is the
process of reasoning from one or
more
general statements (premises) to
reach a logically certain conclusion.
• Deductive reasoning
links premises with conclusions. If
all premises are true, the terms
are clear, and the rules of
deductive logic are followed, then
the conclusion reached
is necessarily true.
Inductive
• http://www.youtube.com/watch?v=wg-
5LvwBnc4
• Inductive reasoning is reasoning in which the
premises seek to supply strong evidence for
(not absolute proof of) the truth of the
conclusion. While the conclusion of a
deductive argument is supposed to be
certain, the truth of an inductive argument is
supposed to be probable, based upon the
evidence given.
• It is a common fallacy to state that inductive
arguments reason from the specific to the
general, while deductive arguments reason
from the general to the specific. This is
sometimes true (as in inductive
generalizations), but this is not generally the
case, as in all of the other types of inductive
inference listed below (e.g. statistical
syllogisms and arguments by analogy).
13. Black Swan Theory
• The black swan theory or theory of black swan events is a metaphor that
describes an event that comes as a surprise, has a major effect, and is often
inappropriately rationalized after the fact with the benefit of hindsight.
• The theory was developed by Nassim Nicholas Taleb to explain:
• The disproportionate role of high-profile, hard-to-predict, and rare events that are
beyond the realm of normal expectations in history, science, finance, and
technology
• The non-computability of the probability of the consequential rare events using
scientific methods (owing to the very nature of small probabilities)
• The psychological biases that make people individually and collectively blind to
uncertainty and unaware of the massive role of the rare event in historical affairs
• Unlike the earlier philosophical "black swan problem," the "black swan theory"
refers only to unexpected events of large magnitude and consequence and their
dominant role in history. Such events, considered extreme outliers, collectively
play vastly larger roles than regular occurrences. More technically, in the scientific
monograph Lectures on Probability and Risk in the Real World: Fat Tails (Volume 1),
Taleb mathematically defines the black swan problem as "stemming from the use
of degenerate metaprobability"
14. Black Swan Theory
• http://www.npr.org/te
mplates/story/story.php
?storyId=10300687
15. From the Foot of Hercules
Ex pede Herculem, "from his foot, [we can measure] Hercules", is a maxim
of proportionality inspired by an experiment attributed to Pythagoras:
"The philosopher Pythagoras reasoned sagaciously and acutely in determining and
measuring the hero's superiority in size and stature. For since it was generally agreed that
Hercules paced off the racecourse of the stadium at Pisae, near the temple of Olympian Zeus,
and made it six hundred feet long, and since other courses in the land of Greece, constructed
later by other men, were indeed six hundred feet in length, but yet were somewhat shorter
than that at Olympia, he readily concluded by a process of comparison that the measured
length of Hercules' foot was greater than that of other men in the same proportion as the
course at Olympia was longer than the other stadia. Then, having ascertained the size of
Hercules' foot, he made a calculation of the bodily height suited to that measure, based upon
the natural proportion of all parts of the body, and thus arrived at the logical conclusion that
Hercules was as much taller than other men as the race course at Olympia exceeded the others
that had been constructed with the same number of feet." (translated by John C. Rolfe of the
University of Pennsylvania for the Loeb Classical Library, 1927)
In other words, one can extrapolate the whole from the part. Ex ungue leonem, "from its claw [we
can know] the lion," is a similar phrase.
The principle was raised to an axiom of biology ; it has found dependable use in paleontology,
where the measurements of a fossil jawbone or a single vertebra, offer a close approximation of
the size of a long-extinct animal, in cases where comparable animals are already known. The
studies of proportionality in biology are pursued in the fields
of morphogenesis, biophysics and biostatistics.
16. Big Data
• Statistics has many ties
to machine
learning and data
mining.
"Lies, damned lies, and statistics" is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments. It is also sometimes colloquially used to doubt statistics used to prove an opponent's point.
The term was popularised in the United States by Mark Twain (among others), who attributed it to the 19th-century British Prime Minister Benjamin Disraeli (1804–1881): "There are three kinds of lies: lies, damned lies, and statistics." However, the phrase is not found in any of Disraeli's works and the earliest known appearances were years after his death. Other coiners have therefore been proposed, and the phrase is often attributed to Twain himself.
Inductive categorical inference[edit source | editbeta]
Popper held that science could not be grounded on such an invalid inference. He proposed falsification as a solution to the problem of induction. Popper noticed that although a singular existential statement such as 'there is a white swan' cannot be used to affirm a universal statement, it can be used to show that one is false: the singular existential observation of a black swan serves to show that the universal statement 'all swans are white' is false—in logic this is called modus tollens. 'There is a black swan' implies 'there is a non-white swan,' which, in turn, implies 'there is something that is a swan and that is not white', hence 'all swans are white' is false, because that is the same as 'there is nothing that is a swan and that is not white'.
One notices a white swan. From this one can conclude:
At least one swan is white.From this, one may wish to conjecture:
All swans are white.It is impractical to observe all the swans in the world to verify that they are all white.
Even so, the statement all swans are white is testable by being falsifiable. For, if in testing many swans, the researcher finds a single black swan, then the statement all swans are white would be falsified by the counterexample of the single black swan.
Deductive falsification[edit source | editbeta]
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (November 2011)Deductive falsification is different from an absence of verification. The falsification of statements occurs through modus tollens, via some observation. Suppose some universal statement U forbids someobservation O:
Observation O, however, is made:
So by modus tollens,
Although the logic of naïve falsification is valid, it is rather limited. Nearly any statement can be made to fit the data, so long as one makes the requisite 'compensatory adjustments'. Popper drew attention to these limitations in The Logic of Scientific Discovery in response to criticism from Pierre Duhem. W. V. Quine expounded this argument in detail, calling it confirmation holism. To logically falsify a universal, one must find a true falsifying singular statement. But Popper pointed out that it is always possible to change the universal statement or the existential statement so that falsification does not occur. On hearing that a black swan has been observed in Australia, one might introduce the ad hoc hypothesis, 'all swans are white except those found in Australia'; or one might adopt another, more cynical view about some observers, 'Australian bird watchers are incompetent'.
Thus, naïve falsification ought to, but does not, supply a way of handling competing hypotheses for many subject controversies (for instance conspiracy theories and urban legends). People arguing that there is no support for such an observation may argue that there is nothing to see, that all is normal, or that the differences or appearances are too small to be statistically significant. On the other side are those who concede that an observation has occurred and that a universal statement has been falsified as a consequence. Therefore, naïve falsification does not enable scientists, who rely on objective criteria, to present a definitive falsification of universal statements.