SlideShare a Scribd company logo
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-19/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-29/17/2019
The univariate approach is based on the analysis of:
1. Central tendency: mean, median, mode, counts of
missingness and non-missingness.
2. Dispersion: standard deviation, inter-quartile range, range
3. Distribution: histogram (or density estimate), quantile plot,
cumulative distribution function, table of relative frequencies
4. Exploring variables one by one.
5. Variables can be continuous (interval based) or nominal
(will concentrate on binaries).
6. Statistical Inference (part of all of EDA and modeling).
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-39/17/2019
Quick Definitions that you should know even in your sleep.
i
Variable X, n observations.
1
(X)= X X
n
50th percentile (median):
If n even, Med(X) = average of two central sorted values of X.
If n odd, Med(X) = central value of sorted values of X
Mean
Variance
.
(X) =
= 
2
i
i i
1
Var (X) = (X X)
(n 1)
Standard Deviation (X) std(X) Var(X)
Standard Error of mean (X) std(X) / n
Range Max(X) Min(X)
Median Absolute Deviation (MAD) = med | x - med(x )|
Inter-Quartile Range = 75th percent
−
−
= =
=
= −

ile - 25th percentile
Mode: Most frequent value (more useful for nominal variables).
With so many measures of central tendency and dispersion
a variable is distributed along many values, usually graphe

d ==>
are there distributions that usually resemble or describe them?
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-49/17/2019
Additional Measures
Harmonic mean, used to average rates. Let x1 ,,, xn be positive numbers
(H tends strongly to min (xi, mitigates effect of large outliers ad enhances
effect of small outliers). Used in finance for time series data, e.g., P/E data).
n
i 1 1i
n
1 n 1 n
i 1 i
1 n
1
xn1 1 1H( , , , , , ) ( )
x x avg(x , , , x )1 n
x
x , , , , x 0
= −
=
= = =



Leonardo Auslender –Ch. 1 Copyright 2004 1.1-59/17/2019
Additional Measures
Geometric mean, for any x1,,,,, xn. Used to average growth rates.
n
1 n 1 nG(x ,,,,x ) x ,,,,x
H G Avg
=
 
Example: Let {x} = (1, 2, 3).
H = 1.636 < g = 1.817 < mean = 2
Example: Let {x} = (1, 1/2, 1/3).
H = 0.5 < g = 0.55 < mean = 0.61
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-69/17/2019
Some Basic Definitions: Univariate Distributions.
Skewness: opposite of symmetry.
Measures direction and degree of asymmetry: zero indicates symmetrical
distribution. Positive value indicates right skewness (long-tailedness to the
right) while negative value indicates left skewness.
Perfectly symmetrical,
non-skewed, distribution: mean, median and mode are equal. Positive
skewness: mean > median ➔ most values < mean. And opposite for negative
skewness.
For instance, store sales are typically skewed.
Positive Skewness.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-79/17/2019
If data distribution is symmetric
mean = median = mode
If positively skewed
mode < median < mean
If negatively skewed
mean < median < mode
Symmetry and measures of central tendency
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-89/17/2019
Heaviness of tails of distribution.
Usual reference point is normal distribution. If b2 = three (g2 zero) and
skewness = zero, distribution is normal. Uni-modal distributions that have
kurtosis > 3 have heavier tails than normal. These same distributions also
tend to have higher peaks in center of distribution. Uni-modal distributions
whose tails are lighter than normal distribution tend to have kurtosis < 3.
In this case, peak of distribution tends to be broader than normal.
2 2
Kurtosis = Peakedness.Two most frequently
used measures, Pearson's b and Fisher's g
4 ,
2 2
2
( 1)( 1) 3( 1)
[ ]
2 2( 2)( 3) 1
m
b
m
n n n
g b
n n n
=
+ − −
= −
− − +
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-99/17/2019
Some Basic Definitions Kurtosis: peakedness.
Excess Kurtosis = leptokurtic:, e.g.,
Stock price returns (Bollerslev-
Hodrick, 1992).
Normal
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-109/17/2019
Homework and Interview Questions.
Since the mean estimation divides by n, why is the variance
divisor (n - 1), and not ‘n’, or (n – 2), or (n + 1) of sqrt (5)? Do
some reading, no need for mathematical proof.
Why do we work ‘squaring’ (e.g., variance) and not straight
absolute values, for instance?
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-119/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-129/17/2019
DS1: Study Measures of Central Tendency and dispersion.
Note median = 0 and MAD = 0 for No_claims. Can you explain it?
Basics and Measures of
centralitity
# Nonmiss Obs % Missing Mean Median Mode
Variable
5,960 0.00 8.941 8.000 9.000
DOCTOR_VISITS
MEMBER_DURATION
5,960 0.00 179.615 178.000 180.000
NO_CLAIMS
5,960 0.00 0.406 0.000 0.000
NUM_MEMBERS
5,960 0.00 1.986 2.000 1.000
OPTOM_PRESC
5,960 0.00 1.170 1.000 0.000
TOTAL_SPEND
5,960 0.00 18,607.970 16,300.000 15,000.000
Measures of Dispersion Variance Std Deviation Std of Mean Median Abs Dev Nrmlzd MAD
Variable
52.31 7.23 0.09 5.00 7.41
DOCTOR_VISITS
MEMBER_DURATION
6,736.56 82.08 1.06 57.00 84.51
NO_CLAIMS
1.16 1.08 0.01 0.00 0.00
NUM_MEMBERS
0.99 1.00 0.01 1.00 1.48
OPTOM_PRESC
2.74 1.65 0.02 1.00 1.48
TOTAL_SPEND
125,607,617.29 11,207.48 145.17 6,000.00 8,895.60
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-139/17/2019
Too quick a note on missing values.
In previous slide, no variable has missing values, A VERY RARE
EVENT. Typically, all large data bases have missing values, even
in small percentages.
Since most software operates on ‘full’ rows, i.e., existence of just one
missing point in any variable in the row, missingness propagates
quickly (full detail in MEDA under missing values).
Thus, UEDA can proceed to obtain measures of central tendency and
variation except when missingness is 100% for a specific variable. But
BEDA already can suffer tremendously.
ADVICE: Find out Missings by UEDA first. Then decide whether to impute (see
MEDA section later on) or delete observations (try hard not to). Then, continue your
analysis and even modeling.
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-149/17/2019
proc univariate data = &indata. (keep = &vars. ) CIBASIC
CIPCTLNORMAL
ROBUSTSCALE normal ;
%PUT;
histogram &vars. / normal (color = black w = 7 l = 25)
kernel (k = normal c = 0.2 0.5 0.8 1 color = green w = 5
l = 1)
;
inset nmiss = "# missing" (5.0)
n = "N"
min = "Min" mean = "Mean"
median = "Median" mode = "Mode"
max = "Max" (6.3)
Normal kernel (type)
;
Run;
Quit;
SAS code for next slide of variable distributions: Histograms.
Leonardo Auslender –Ch. 1 Copyright 2004
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-169/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-179/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-189/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-199/17/2019
Important note on nominal variables.
Also called categorical variables. Categories may denote our own
values and not realities. Defining races as Black, White, Hispanic, Asian,
etc implies considering Chinese and Indonesian to be the same.
Instead, red and blue color are distinct realities ➔ we use own values to
create these categories.
Problem is  we tend to create hypotheses, variables, conditions,
testing environments, and of course conclusions from these
constructs that may be arbitrary. E.g., marketing segment assignment
places you in Hispanic group because you learned Spanish in high
school and speak it somehow, when you’re not Hispanic.
➔ Given our creation of intended hypotheses, categories, and
conditions of data gathering, match representatives of different ‘races’ to
reach conclusions that may be plagued with errors due to category
construction.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-209/17/2019
Not all Vars shown.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-219/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-229/17/2019
Why transform? Because transformed variables may be skew free
(skewness obviously affects Variance estimation), closer to ‘normality’ (If
needed or wanted), provide rank information. All these issues are strongly
related to statistical inference and modeling. Transformations can be
multi-variate (e.g., Principal components, etc.) as well.
More importantly, if data is used in actual practice in transformed manner,
then transformation may be advisable. E.g., many medical decisions (e.g.,
prostate cancer and PSA) are based on thresholds on PSA counts. Do
not transform while modeling, do transform when reporting and presenting
results.
Continuous variables differ in range (min, max) and spread:
often convenient to homogenize them to the same units for comparison.
Centering: Given variable x, for each observation, subtract the mean
value. Resulting variable has mean 0. Note that distribution is not
centered (i.e., more symmetrical) unless mean = mode.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-239/17/2019
Standardization: given variable x, for each observation, subtract the
overall mean and divide by the standard deviation. Mean removal is
called ‘centering’, and the standardized variable measures how far
In units of std each obs is from 0, because the new mean is 0 and the
new std is 1.
NOTE: Standardization is not equivalent to normalization.
Log Transformation: apply log (X), for X > 0. If X has negative values
could instead do log (X + min (X) + 0.001). PROBLEM: If modeling
situation to implement method in future data sets, Min (X) of
Original data set may be different from min (x) of future dataset.
Binning: Many different methods. Popular one is divide range into
# of equal sized sub-ranges. # could be 5 or 10 but application
dependent. Method seriously abused.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-249/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-259/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-269/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-279/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-289/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-299/17/2019
Comments on Transformations.
Shown transformations are monotonic. A ratio transformation ((1 /
x), x ne 0) is reverse monotonic, as –x is as well.
HW question:
Consider transformations of variable Doctor_visits in previous
slides. Compare them and report.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-309/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-319/17/2019
Probability
Classical (a priori) Definition: If Event E can occur in “k” ways out of total of “n” equally
likely possible ways, then probability p is defined as p = k / n.
Frequency (a posteriori) definition: If event E happens k times after n repetitions, then p = k /
n. Implicitly assumes equally likely and independent repetitions.
Both definitions have serious flaws because they are circular, since “equally likely” is same
as “with equal probability” and probability has not yet been defined. The a-posteriori
definition is deficient because it does not specify required number of repetitions. Still,
definition has intuitive appeal for the case of experiments that are repeatable and for which
number of possible outcomes is available.
Besides, property of symmetry is usually hiding: each outcome is equi-probable, which of
course, raises issue of not having defined probability when we are already using the
notion of equi-probability.
The subjectivist definition of probability: it is the degree of belief in the occurrence of event
E by an individual, which of course, can be different from others’ beliefs.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-329/17/2019
Axioms of Probability ***
1) For any event E, P(E) >= 0
2) For the certainty event S, P(S) = 1
3) For any two disjoint events, E and non E, we have P (E U non E) = P (E) + P (non E),
and similarly for an infinite sequence of disjoint events.
Theorems on probability spaces ***
1) The impossible event, the empty set ∅ , has probability 0.
2) For any event A, the probability of its complement if P(A complement) = 1 – P(A)
3) For any event A, 0<= P(A) <= 1
4) If A C= B, then P(A) <= P(B)
5) For any two events A and B, P(AB) = P(A) – P(A B)
6) For any two events P (A U B) = P(A) + (PB) – P(A B)
7) For any events A, B, C
P (A U B U C) = P (A) + P (B) + P (C) – P (A B) – P ( A C) – P(B C) + P (A B C)
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-339/17/2019
Beware of unconditional versus conditional probability interpretations.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-349/17/2019
Example of conditional versus unconditional statements.
(from https://articles.mercola.com/sites/articles/archive/2018/07/31/
effects-of-cellphone-radiation.aspx?utm_source=dnl&utm_medium=email&utm_content=artTest_B1&utm_campaign=
20180725Z1_UCM&et_cid=DM223789&et_rid=375100020)
“Of 326 cellphone safety studies, 56 percent found a biological effect
from cellphone radiation while 44 percent did not. When funding
was analyzed, it was discovered that 67 percent of the funded
studies found a biological effect, compared to just 28 percent
of the industry-funded studies. This funding bias creates a perceived
lack of scientific consensus.”
That is, Pr (Effect) = 56%.
Pr (Effect / No funding) = 67%
Pr (Effect / Funding) = 28%
Notice that while conditional effects involve at least two variables (e.g., effect
and funding), resulting analysis is univariate on Effect.
Application: Conditional probability is basis of Associations Analysis, heavily
used in marketing.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-359/17/2019
Probability obfuscation. O. J. Simpson’s case.
By law, Pr (Simpson not guilty of killing wife without further evidence to
contrary) = 100% (later on called Null Hypothesis).
Prosecutor argues that defendant battered his wife.
Defense lawyer Dershowitz provides following counterargument about
relevance of battering:
4 million women are battered annually by male companions in US.
In 1992, 1,432, or 1 in 2,500, women were killed by them ➔ women batterers
seldom murder their women.
But, Probability that a man who batters his wife will go on to kill her is not
relevant info.
Rather probability that a battered wife who was murdered was murdered by her
abuser. And that number is 90%. Fact that she was murdered already, should be
part of probability computation.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-369/17/2019
1 in 2500
battered
Murdered.
Murdered by
Male companions.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-379/17/2019
Data Distributions.
Univariate distributions take many forms. We focus on some visual attributes, analyzed
in reference to central points and ‘tails’ of the distribution.
The Normal distribution is the most used and abused, with shape:
(https://www.kdnuggets.com/2018/07/explaining-68-95-99-7-rule-normal-distribution.html)
68% within 1 std, 95% within 2 std, 99.7% within 3 std. Distribution of most observations
hover around the mean (= median, = mode). Odds of deviation from this average (or
chances of a value being different from the medium) decline at an increasingly faster
rate (exponentially) as we move away from the average. Tails are thin.
Mean and variance fully define Normal distribution.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-389/17/2019
Normal Distribution.
When mean = 0 and variance = 1, N becomes standard normal
distribution, denoted N (0, 1).
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-399/17/2019
LAW OF LARGE NUMBERS (LLN, Bernoulli)
As sample size (n) increases, sample mean converges to population
mean, if latter exists. (For instance, sampling from Cauchy distribution does not
converge to anything because Cauchy does not have mean. However, Cauchy
sample medians converge to population median (Blume & Royall 2003)).
In typical casino setting, LLN assures casino that it will not lose any
money as long as bets keep on coming, It does not assure that at
every specific bet, casino will make money.
Conclusion: Either skip the casino, or own one.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-409/17/2019
Roulette illustration.
There are 36 colored slots, plus 2 non-colored, 0 and 00. if you bet
$1 on slot 20 and you win, you get back a total of $36, for $35 of
profit. Else, you lose $1.
Prob (winning) = 1/ 38, prob (losing) = 37/38.
Expected value: 1/38 (35) – 1 (37/38) = -0.0526. On average, every
time you bet on roulette $1, you lose a bit more than 0.05.
➔Casinos know that LLN is on their side and for large number of
bets, they make 5% on average.
If you play 38 times, on average you win once. That is, you pocket
$36 and paid out $38. Loss = 2 / total spent = 0.0526.
If 1000 people bet this way 35 times, about 610 will have won at
least once (61%), but ‘winners’ earnings will be less than lossers’
losses.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-419/17/2019
CENTRAL LIMIT THEOREM CLT (simplest form).
Distribution of sum of independent random variables, with common distribution
with finite mean and variance, approaches normal distribution ➔ sample mean
tends to N.
Note: convergence is to distribution, not to specific number.
(from bwgriffin.com/gsu/courses/edur8131/content/EDUR_8131_CLT_illustrated_one_page.pdf)
SAT data: Figures 2 through 9 show histogram, not of raw SAT scores, but of
means from samples of differing sizes. Figure 2, for example, shows means taken
from a sample size of 2. To construct Figure 2, a total of 5,000 samples (n = 2 for
each sample) of SAT scores were taken from those SAT scores displayed In
Figure 1. For Figure 3, another set of 5,000 samples was taken from SAT scores,
but with a sample size of 3. Each successive figure shows distribution of sample
means for varying sample sizes.
Thus, Figures 2 through 9 are histograms of sampling distributions for the mean.
Note that as sample sizes increase, the shape of the sampling distribution of
means approaches a normal curve and looks less and less like the bimodal
distribution of raw SAT scores. This is exactly what the central limit theorem
predicts.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-429/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-439/17/2019
Importance of Normal Distribution.
Many statistical inferential tests (to decide whether product A is
different than B, patient A benefits more from drug A than B, etc.) rely
on normal distribution, mostly via the CLT.
This does not imply that all data sets or individual or groups of
variables are normally distributed. In fact, most variables, individually
or viewed multi-variably, are not normally distributed. For instance,
when modeling linear regressions, usual assumption is that error
component is N (0, 1) and we test this hypothesis on the residuals. In
many other areas, there are tests available to test many hypotheses.
But in many circumstances, it is possible to transform the variables to
normality, and sometimes desirable (un-reviewed in this class).
Height distribution of humans is NOT normal but binomial (due to
gender), no matter sample size.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-449/17/2019
Bird’s eye view of probability distributions
datasciencecentral.com/profiles/blogs/common-probability-distributions-the-data-scientist-s-crib-sheet
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-459/17/2019
Counting and Probability Homework
1) Toss 5 fair quarters once, H and T. How many possible
outcomes are there?
2) You flip a fair coin 8 times and obtain H H H H H H H T. What’s
the probability of obtaining 1 T and 7 H in 8 flips?
3) Flip a fair coin thrice without noting the result. A friend insists
that the second flip was T. What’s the probability that the first
flip was H?
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-469/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-479/17/2019
Statistical Inference – Some Basic Definitions Sample (n) and
population sizes (N) ***
Sample distribution determined by ‘n’, NOT by n/N (sample proportion).
Population observations must be independent of each other. Inference
always on samples, sample determination is huge topic. N = pop. size, n =
sample size.
1) For given N, there are (N choose n) different samples. 2 cases: Cases of N = ∞,
(e.g., universe of coin flips). N < ∞ (attendees at present lecture).
2) For given N, sampling with increasing n generates ‘thinner’ distributions, i.e.,
smaller spread.
3) For constant n/N, and n and N increasing in same proportion, sample spreads
vary dependent on ‘n’.
4) For given n, increasing N does not change much sampling distribution.
5) Most statistical formulae divide by ‘n’, not by N.
6) For case of multivariate samples, e.g., Giga bases, ‘n’ is related to ‘p’ (number of
variables) with some heuristics, based on computer resources, specifics of
analytics and rules of thumb.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-489/17/2019
Statistical Inference – Some Basic Definitions.
Curse of Dimensionality in high dimensions ***
Consider variables X, Y, Z uniformly distributed U (0, 1). If choose 10% sample
for X alone, expected edge length = .1. I.e., 10% X sample covers 10% of range.
But if want to still have 10% range with X, Y sampling, needs sample proportion
of .1 ** (1 /2) = .32. If X, Y, Z for 10% range, needs 46% sample proportion
(formula = .1 ** (1 / p).
➔ Adding dimensions (i.e., variables) ➔ higher sampling proportion to avoid
sparseness on the data to analyze. In fact, studying further dimensions for
given sample size ➔ true aspects of data (especially in modeling) are not
represented in sampled data.
➔ If n suffices for 1 dimension variable, n **2 required for 2, n ** 3 for 3, etc. But
there are not enough data points in the universe for very large p.
Sampling needs grow exponentially with dimensions.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-499/17/2019
Statistical Inference – Some Basic Definitions
Probability Intervals for Sample Mean .
Let Z  N(0, 1), known mean and variance, sample repeatedly from this distr.
The 95% probability of range of sample means likely to be observed is:
(1.96 is 97.5th percentile of normal distribution, μ = 0, σ2 = 1).
Expression in absolute brackets is called probability interval. For fixed n, and
assuming known  and σ, sample mean will be in that interval with very high
probability. Note, repeated samples of size n and Z ~ N (0, 1).
For non-normal Z and large n, probability is approximately equal to .95
thanks to CLT, and sample mean is approximately normal.
1.96
(| | 1.96) (| | 1.96) (| | ) .95
/
 


−
 =  = −  =
n
n
X
P Z P P X
n n
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-509/17/2019
***
In more realistic setting, unknown μ (assume known σ for ease),
derive from probability interval method to obtain confidence interval
for unknown parameters. Emphasis is on the method, which does not
depend on having observed any data. Interval
is random because
is random. The interval is fixed however for given sample. Once
calculated, interval is called “confidence interval” (CI) at (1 – α)%
level and is not random ➔ that any other sample derived from the
same population will generate a mean that will lie within the just found
confidence interval or not.
➔phrase “Confidence Interval contains  95% of the time” is
incorrect. Instead, CI is “set of parameter values best supported
by the data at the 95% level”, denoted as 1 – α, α = .05.
➔Bayesian ‘credible intervals’ answer the question.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-519/17/2019
Most basic test, one-sample test.
Want to test whether parameter beta1, usually “mean”, is equal to
specific value, for instance, 0. Take sample, obtain mean estimate (and
CLT, etc.). Is mean estimate close enough to 0 so that we cannot deny
that beta1 = 0?
Formally: H0: null hypothesis, beta1 = 0,
H1: alternative hypothesis, beta1 not zero.
Since std is estimated, distribution is NOT normal, but Student’s
t, with n – 1 degrees of freedom (from a table). t approaches Normal
distr. asymptotically, very quickly.
ˆ
ˆ
ˆ : parameter estimate

1
o 1 1 1
1 1
β
1
H :β = 0 H :β 0.
and the test is :
β - β
~ t(n- 1)
s
β
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-529/17/2019
Nomenclature of Hypothesis testing.
Level of significance of test called alpha (α) to make a
statistical decision about the null hypothesis. It represents the
probability of incorrectly rejecting Hₒ when it is true. Typically
5% but not mandatory. Also called type I error.
Type II error, or β, probability of not rejecting H0 when it is false.
In general, no effort is made in finding out type II error, called
Power analysis, that works with 1 – β, probability of correctly
reject null hypothesis.
p-value: probability, assuming H0 is true, of observing a result
equal to or more extreme than what was actually observed. If p
< α, this suggests data is inconsistent with H0 and it may be
rejected.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-539/17/2019
Some clarifying notes
If beta1~ N (mu, sigma2) (appealing usually to CLT, normality of observation
distribution does not need to be normal), then standardizing ➔
beta1 mu
~ N(0,1),
sigma
or if sigma2 is estimated then
beta1 mu
~ t(n 1)
sigma / n 1
−
−
−
−
If standardized beta1 ~ N(0,1), then 95% of likely values are in [-1.96; 1.96]
interval ➔ beta1 1.96 * sigma / n 1
is 95% confidence interval
 −
Full equivalence between CI and p-value.
Next: testing Null: beta0 = 0, vs. alternatives. If data yields beta1 =
1, cannot reject null. If beta2 = 2.5, null rejected.p-value < 0.05.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-549/17/2019
P-value
For beta2.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-559/17/2019
Examples of inferential tests:
1) IQ change differences between 1st and 3rd elementary school
children? (i.e., change = 0?)
2) Are Republican voters more affluent than Democrat ones?
3) Is fertilizer A better than fertilizer B?
4) Is treatment (drug, vaccine, etc) good at treating disease?
(extreme heavy use of statistical inference)
5) DNA testing for crimes, paternity tests, etc.
Used in almost every aspect of science.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-569/17/2019
Which one is Null Hypothesis?
Strong tendency to nominate as Null whichever scheme that is
statistically easier to compute. E.g., in linear models (next
chapters), Null is lack of predictor effect, and focus is on inference
on estimated parameter value.
But suppose question: Is there life in other planets? Should Null be
NO and Alternative YES? Or Reverse?
In drug testing, should Null be NO ADVERSE EFFECT and reverse
for Alternative? But this puts burden of proof on Alternative. Why
not reverse it and then Null: There are Adverse Effects and burden
is on disproving it.
EG.: Testing screw on bridge. A: If it breaks, not much happens. But
in B:break-up ➔ serious accident. In A, Null is screw is safe, in B,
unsafe.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-579/17/2019
P-values.
Notion of p-values is central to statistical inference. P-value is probability of seeing
outcome (from data or experiment) under assumption that it originated from
starting hypothesis (called null), which could be in dispute but is still prevailing
view. E.g., flip assumed ‘fair’ coin 4 times, 50% probability of either Heads or
Tails, and obtain 4 tails (T). Question: is coin fair (i.e., prob (H) = .5)?
Under assumption p = .5, prob ( 4 tails / p = 0.5) = .54 = .0625, i.e., p-value is
.0625. Is probability small enough to indicate that we do not believe that p = .5?.
For instance, we could flip coin 100 times in sets of 4 to record how many sets
contained 4 tails and compare that proportion to .0625.
The typical threshold to reject the null hypothesis is 5%. In this example, we fail to
reject the null and that is all that can be said. If we had set the threshold at 10%
instead, we would judge the coin to be biased.
➔ p-value: probability of obtaining value of experiment, assuming that null
hypothesis holds. I.e., prob obtaining extreme value of experiment just by
chance. If p-value low (<= 5%?) ➔ null hypothesis should be abandoned, but we
don’t know in favor of what specific alternative.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-589/17/2019
More on p-value
It is NOT “prob (H0) is false”, because H0 is 100% true by
assumption. Remember, p-value is conditional on data at hand, H0
unconditional, p-value = 1 – Pr (H0/ data). Thus, we get Pr (H0/Data).
If p-value low, method cannot evaluate whether NULL is true and
sample was unusual, or NULL might be false.
Assume comparison of new marketing campaign results compared
to established one, and p-value = 3%. Interpretation: assuming that
campaigns results are similar, obtained difference in 3% or more
studies due to sampling error. It is INCORRECT to say that if we
reject NULL, there is 3% chance of making mistake. P-value ≠ [Prob
(error rate) = Prob (reject true H)]
For relationship between p-value and error rates see SELLKE et al (2001): Calibration of p Values for
Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-599/17/2019
T-tests: one way, Paired, 2 groups.
Test can be one-sample, e.g., comparing variable mean to
specific value to decide on POPULATION mean μ .
Example (DS1): Is mean of member_duration = 150? H0: mean =
150, H1 mean_duration ≠ 150. Since p-value < 0.05 (our threshold) ➔
mean is not 150.
Paired: Could be testing math and reading scores on students, thus
data is dependent on each subject. Or frontal left and right tires, data
is paired within car. (no example provided).
0 0
0
1 1
2

 
− −
= 
−
=
 
( )
: , : , : , 2-sided test.
| |
: t-score: compared to t
if t t-value reject H ("significant finding",
else fail to reject H.
n
H x A x x estimated mean
x x
test
s
n
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-609/17/2019
T-tests cont.
Testing across 2 groups.
Compare same variable across two groups, say male vs female,
young vs old, treated patient vs. non-treated, etc. H:
(and alternative, that they are different from zero.) Have 2 variances now.
Issue: whether variance of each group is equal to each other. If
equal, use weighted average of the two (pooled variances).
Otherwise (unpooled), Satterthwaite’s method provides
asymptotic degrees of freedom computation.
DS1 example: compare means of member duration by fraud level.
2 0  − =
1 2
1 2
1 1
p
x x
t
s
n n
−
=
+
2 2
1 2
1 2
s s
Unpooled Std Error
n n
= +
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-619/17/2019
One way
2 groups
Leonardo Auslender –Ch. 1 Copyright 2004
Reject H0 in all cases.
Inference Tests.
t Value
P-
value DF
Variable model_name Method
13.49 0.00 5958.00member_duration TTEST_2GROUPS Pooled
Satterthwaite 14.36 0.00 1982.49
TTEST_ONEWAY 154.84 0.00 5959.00
SAS CODE
ODS LISTING CLOSE;
ods output ttests = ttests statistics =
statistics; %put;
proc ttest data = &INDATA. var &groupvar.; ;
class &classvar.; run;
ods listing;
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-639/17/2019
Important comments on NHST (Null Hypo. Stat. Testing)
Is “significant finding” important, or substantial? Not given by statistical testing but by expert
knowledge of event at hand. Notice that by increasing sample size ‘n’, t-test denominator
decreases ➔ higher chances of finding ‘significance’, but finding may be irrelevant.
But more than this: We are testing
When we want
NHST fails to reject H0 because it is not being tested.
Example:
Valid reasoning Invalid reasoning
1) All persons are mortal. 1) Most women don’t like soccer.
2) Socrates is a person. 2) Mia Hamm played soccer
3) ➔ Socrates is mortal. 3) Mia Hamm is not a woman.
The invalid reasoning conclusion 3) is a false negative due to ‘most’ in 1) which is Ho. The
test part is given by 2) and, accepting that 1) is true, concludes 3) which is wrong.
1 1 0 = ˆPr( / )H
0 1  = ˆPr( is true / )H
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-649/17/2019
important comment on present practice of NHST
Strong tendency to become priests and adorers of <= 5% p-value,
regardless of whether findings are of practical importance.
Strong tendency to search, transform, manipulate to obtain sacred <=
5% number.
Strong tendency to underreport results that don’t reach 5% p-value.
Strong tendency to disregard Type I vs. Type II error (see next slide).
NHST mostly focuses on TYPE I error.
5% p-value is not part of Constitution, Magna Carta, Ten
Commandments, Koran or any other important document.
And worse: present practice, p-hacking (or data-dredging-
snooping-fishing, cherry-picking….).
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-659/17/2019
Type I vs Type II errors.
From https://heartbeat.fritz.ai/classification-model-evaluation-90d743883106
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-669/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-679/17/2019
1) Focus analysis just on data subset where interesting pattern
was found.
2) Disregarding multiple comparisons adjustments, and non-
Reporting of non-significant results.
3) Using different tests (eg., parametric vs. non-parametric) on
same hypothesis and only reporting significant results.
4) Removing ‘outliers’ to prove given hypothesis, or choosing
Data points to obtain significance. Or dropping variables because
of imaginary or not problems (e.g., co-linearity in linear models).
5) Transforming data, especially when modeling, for significant
discoveries.
How bad is it in applied work?
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-689/17/2019
P-hacking prevalence: almost universal except for saints (see
Berman, 2018 for p-hacking in marketing).
Note that p-hacking is based on using statistical inference.
Methods that do not use statistical inference are not affected.
Also, p-hacking can be understood as inferential EDA and thus
requires additional data sets to validate the just found
hypotheses.
Data Science typically works by splitting the (enormous) data set
in at least 3 datasets (typically randomly):
-Training: Find hypotheses, models.
-Validation: Search hypotheses without using inference, without
using model search and obtain estimates.
-Testing: Verify hypotheses if validated. Use prior estimates to
obtain final model results.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-699/17/2019
But ………….
Just validating and testing does not ensure full verification.
Further techniques:
Cross Validation (web, see reference at end of presentation.)
Bootstrapping – jacknifing (1979)
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-709/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-719/17/2019
Multiple comparisons ***
Assume not one but 20 hypotheses tested on same data set with α = 5%. While
prob rejecting H0 when H0 true is 5%, prob (at least one significant result out of
20) = 64% = 1 – P (no overall significance) = 1 – (1 – 0.05)20.
Different methods to handle it: Bonferroni, false Discovery Rate (FDR),
positive FDR (pFDR), etc.
Present unfortunate practices:
1) Very common to just focus on significant findings and disregard presenting
all information on non-significant findings. Example: study on opera singing
and sleeping posture finds singers tend to sleep in specific position. But
study omits that most comparisons were insignificant, from diet differences,
sleep medication usage, gender, sex, etc.
2) Very common to just report on what is convenient. Company tests new drug
efficacy on 2 different groups of people and results are positive for one but
negative for other. Company reports just positive result.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-729/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-739/17/2019
Sampling practices
Sampling seldom done with replacement ➔ sample size should be
large enough to:
1) Diminish probability of spurious dependence across
observations.
2) Ensure that rare categories in nominal variables are still
represented. Essentially, sampling w/o replacement in small
samples ➔ proportions are distorted.
3) Provide enough observations so that overall cardinality of the
data set is still representative of the population.
Under these conditions, rule of thumb is to sample %10 of the
population at least. Bear in mind that for very large p / n
(undefined), the percentage should be higher.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-749/17/2019
Sampling.
Often entire population data too large or expensive to obtain.
E.g., population data on US males average weight in December
2016.
In other cases, study sample to analyze results expected in
future populations. E.g., population of credit card payers in
future years. Expectation is that future population properties
represented in present sample data.
Two big types of sampling:
a) probabilistic – random sampling.
b) non-probabilistic
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-759/17/2019
Random Sampling
Simple Random Sampling: equi-probability of selection for each
observation.
Stratified sampling: random sampling within strata level of one (or
few) variable/s, typically categorical. Heavily used in classification
methods).
Systematic sampling: observations chosen at regular data
intervals.
Cluster Sampling: sampling within clusters of observations.
Multi stage Sampling: mixture of previous methods.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-769/17/2019
Non-random sampling
Convenience Sampling: based on data availability.
Purposive Sampling: sample only from those elements of
interest. E.g., sample those who responded to credit card
solicitation.
Quota Sampling: sample until exact proportion of certain
characteristics is achieved.
Referral /Snowball Sampling: sample is created by referrals from
participants to participants. E.g., sampling of HIV patients, pick-
pokets… Method subject to high bias and variance, and cannot
claim statistical significance status of random sampling.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-779/17/2019
Issues in sampling
Assume research on bus quality of service, for which you
ask questions from people waiting at the bus stop. One
main concern for quality of service is late buses.
You obtain many answers and report on your results.
Problem: people who respond are those more likely to have
waited longer than those already on the bus because you
sample at bus stop. Thus, answers will be biased toward
poor service.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-789/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-799/17/2019
Normality Check: QQPlots
Plots compare order values of variable of interest (Y axis) with
quantiles of normal distribution. If pattern is linear ➔ variable is
normally distributed. Overlayed reference straight line indicates
perfect normality.
Description of Point Pattern Possible Interpretation
all but a few points fall on a line outliers in the data (later reviewed)
left end of pattern is below the line; right end
of pattern is above the line
long tails at both ends of the data
distribution
left end of pattern is above the line; right end
of pattern is below the line
short tails at both ends of the data
distribution
curved pattern with slope increasing from left
to right
data distribution is skewed to the right
curved pattern with slope decreasing from
left to right
data distribution is skewed to the left
staircase pattern (plateaus and gaps) data have been rounded or are discrete
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-809/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-819/17/2019
Outlier.
Right Skew.
Addl. Examples
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-829/17/2019
Left Skew
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-839/17/2019
data hello2;
length varname $ 32;
set training (in = in1 keep = DOCTOR_VISITS rename = (DOCTOR_VISITS =
varvalue) ) training (in = in2 keep = FRAUD rename = (FRAUD = varvalue) ) training
(in = in3
keep = MEMBER_DURATION rename = (MEMBER_DURATION = varvalue) ) training (in = in4
keep =
NO_CLAIMS rename = (NO_CLAIMS = varvalue) ) training (in = in5 keep = NUM_MEMBERS
rename =
(NUM_MEMBERS = varvalue) ) training (in = in6 keep = OPTOM_PRESC rename =
(OPTOM_PRESC =
varvalue) ) training (in = in7 keep = TOTAL_SPEND rename = (TOTAL_SPEND = varvalue)
) ;
if in1 then varname = "DOCTOR_VISITS";
if in2 then varname = "FRAUD";
if in3 then varname = "MEMBER_DURATION";
if in4 then varname = "NO_CLAIMS";
if in5 then varname = "NUM_MEMBERS";
if in6 then varname = "OPTOM_PRESC";
if in7 then varname = "TOTAL_SPEND";
LABEL VARNAME = "Var Name" Varvalue = "Variable";
run;
PROC UNIVARIATE DATA = hello2 NOPRINT;
class varname;
VAR varvalue;
qqplot / NCOL = 3 NROW = 1;
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-849/17/2019
Homework
Select your data set and software. Obtain Qqplots for some
variable/s and diagnose them.
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-859/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-869/17/2019
The Dow Jones index fell 22.61% in one day in October of 1987. It was a “25-
standard deviation event” according to the prevailing calculations.
That is an occurrence so rare that if the stock market had been open every single
day since the Big Bang… it still shouldn’t happen. And yet it did.
Comment on how the standard deviation was possibly derived and how the 25 std
event was used to justify that it shouldn’t have happened. And let’s not think
about 2007-2009 for the time being.
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-879/17/2019
TV game show. Host (Monty Hall) shows 3 doors to contestant. Behind two of the
doors, there are goats, behind the other one, a car. If the contestant chooses the
right door, he/she wins the car.
Twist: Once the participant has chosen a door and before Monty opens it (and
Monty knows behind which door the car is located), Monty opens one door that
he knows has a goat behind.
At this point, Monty offers the participant the chance to switch his/her election to
the remaining door.
Should the participant switch? Yes? No? Why?
Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-889/17/2019
The immediate reaction is that the initial probability of success is 1/3 and after
Monty opened the goat door, the probability is then ½ and thus there is no gain in
switching or staying with the original choice.
Wrong. Let us call the doors 1, 2 and 3. The following table summarizes it all
when door 1 is chosen initially (https://en.wikipedia.org/wiki/Monty_Hall_problem)
Thus, the probability of winning if the participant switches is 2/3.
Note: This puzzle raised intense and acrimonious debates among statisticians.
Leonardo Auslender –Ch. 1 Copyright 20049/17/2019
3. Tea and/of coffee?
In a large group of people, it is known that %70 of them
drink coffee and %80 drink tea. What is the lower and
upper bound of those who like both? (In obvious notation
C and T)
We know that total probability is:
( ) ( ) ( ) ( ) 1
.7 .8 1 ( ) ( )
( ) 0 (everybody drinks something) ( ) .5
( ) .8 max ( ) .2 ( ) .7
P P T P C P T C P T C
P T C P T C
If P T C P T C
Since P C P T C P T C
= + − + = 
+ = + −
=  =
=  =  =
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-909/17/2019
Go, Bayes
Go!!
(Under construction)
***
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-919/17/2019
Bayesian inference (BI).
Example of coin tossing. In BI, establish parameters and models.
An important part of Bayesian inference is establishment of parameters and models.
Models: mathematical formulation of observed events.
Parameters: factors in models affecting observed data.
For example, in coin tossing, fairness = parameter of coin denoted by θ. Outcome
denoted by D.
Q.: What is Pr ( 4 heads out of 9 tosses) (D) given coin fairness, i.e., P(D / θ)? This is
frequentist way of looking at problem.
More interested in knowing: Given D (4 heads out of 9 tosses), what is Pr (θ ) = 0.5? By
Bayes’ Theorem:
( / ) ( )
( / ) ,
( )
P D p
p D Theta
p D
 
 = =
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-929/17/2019
If we know θ, likelihood would give P (D) for any event. p(D) is ‘evidence’,
estimated over all possible values of θ (could be more than one).
Significance Tests
P-value
Frequentist: calculate t-score from fixed size sample, obtain p-value . If p-value is 3% for mean
= 50, ➔ there is 3% probability that sample mean will equal 50, assume some H.
Problem: when n changes for different sample sizes, t-scores change also ➔ 5% threshold is
meaningless to decide on whether to reject or not H.
( / ):posterior belief or probafter observingD.p D
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-939/17/2019
CIs
Cis are p-value cousins ➔ same problem with changing n. Since CI not a
probability distribution, impossible to know which values are most probable.
Bayes Factor
Bayesian equivalent of p-value.
Bayesian H assumes “∞” prob distr only at particular value of
parameter (θ = 0.5 for fair coin) and zero prob elsewhere. (M1)
Alternative hypo: all values of θ are possible ➔ flat curve distr (M2)
Next slide shows prior situation (left panel) and posterior situation (right panel).
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-949/17/2019
Given 4 heads out of 9, prob. has shifted from 0.5 to something
larger given by panel B. In A, left and right bars indicate prob values
of H and A. In panel B, H is now < 0.5 and A > 0.5.
Bayes factor: ratio of posterior odds to prior odds. Reject H if BC < .1 (suggested)
Benefits of using Bayes Factor instead of p-values: no sample size effect.
( / ) ( )
/
( / ) ( )
p H D p H
BF
p A D p A
=
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-959/17/2019
Credible interval
Bayesian solution yields ‘posterior’ probability distribution function (pdf).
Assume it is continuous. Then for any two points (a < b) in range of pdf, it is
possible to assert
Pr (a <= X <= b) = c.
If we invert the problem and ask what are the values a and b that yield a
probability c for X in pdf, then [a ; b] is a c% credible interval.
There is infinite number of interval that can yield c% probability. If distribution
is symmetric, range is taken such that (1 – c/2)% is left out in each tail.
With credible intervals, it is possible to ascertain that interval contains true
value with c% probability, which is not possible with traditional Cis.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-969/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-979/17/2019
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-989/17/2019
Random events along discrete time index (t) (months, steps,
number of tests, etc.), measuring cumulative outcome effect. Time
index theoretically goes to infinity.
Example: Flip fair die, if even, walk 100 meters north, else 100
meters south, and repeat 100 times.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-999/17/2019
References
Berman R. et al (2018): p-Hacking and False Discovery in A/B Testing, web.
Cross-validation: https://www.cs.cmu.edu/~schneide/tut5/node42.html
Efron B. (1979): Bootstrap methods: Another look at the jackknife“
The Annals of Statistics, 7(1): 1-26
Tim Bollerslev & Robert J. Hodrick, 1992: Financial Market Efficiency Tests,
NBER Working Papers 4108, National Bureau of Economic Research, Inc.
Leonardo Auslender –Ch. 1 Copyright 2004 1.1-1009/17/2019

More Related Content

What's hot

12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solutionKrunal Shah
 
Sec 3.1 measures of center
Sec 3.1 measures of center  Sec 3.1 measures of center
Sec 3.1 measures of center
Long Beach City College
 
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataPG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
Aashish Patel
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
Smarten Augmented Analytics
 
Malhotra08
Malhotra08Malhotra08
07 ch ken black solution
07 ch ken black solution07 ch ken black solution
07 ch ken black solutionKrunal Shah
 
Poisson Probability Distributions
Poisson Probability DistributionsPoisson Probability Distributions
Poisson Probability Distributions
Long Beach City College
 
Bus b272 f unit 1
Bus b272 f unit 1Bus b272 f unit 1
Bus b272 f unit 1
kocho2
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
CamilleJoy3
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
Long Beach City College
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solutionKrunal Shah
 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solutionKrunal Shah
 
Data Distribution &The Probability Distributions
Data Distribution &The Probability DistributionsData Distribution &The Probability Distributions
Data Distribution &The Probability Distributions
mahaaltememe
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
Long Beach City College
 
01 ch ken black solution
01 ch ken black solution01 ch ken black solution
01 ch ken black solutionKrunal Shah
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbers
nszakir
 

What's hot (20)

12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solution
 
Sec 3.1 measures of center
Sec 3.1 measures of center  Sec 3.1 measures of center
Sec 3.1 measures of center
 
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataPG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
 
The Normal Distribution
The Normal DistributionThe Normal Distribution
The Normal Distribution
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
Malhotra08
Malhotra08Malhotra08
Malhotra08
 
Probability concept and Probability distribution_Contd
Probability concept and Probability distribution_ContdProbability concept and Probability distribution_Contd
Probability concept and Probability distribution_Contd
 
07 ch ken black solution
07 ch ken black solution07 ch ken black solution
07 ch ken black solution
 
Poisson Probability Distributions
Poisson Probability DistributionsPoisson Probability Distributions
Poisson Probability Distributions
 
Dispersion stati
Dispersion statiDispersion stati
Dispersion stati
 
Bus b272 f unit 1
Bus b272 f unit 1Bus b272 f unit 1
Bus b272 f unit 1
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solution
 
Data Distribution &The Probability Distributions
Data Distribution &The Probability DistributionsData Distribution &The Probability Distributions
Data Distribution &The Probability Distributions
 
Decision theory & decisiontrees
Decision theory & decisiontreesDecision theory & decisiontrees
Decision theory & decisiontrees
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
01 ch ken black solution
01 ch ken black solution01 ch ken black solution
01 ch ken black solution
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbers
 

Similar to 2 ueda

2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
Leonardo Auslender
 
Linear regression
Linear regressionLinear regression
Linear regression
Leonardo Auslender
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
Leonardo Auslender
 
Ch 5 CENTRAL TENDENCY.doc
Ch 5 CENTRAL TENDENCY.docCh 5 CENTRAL TENDENCY.doc
Ch 5 CENTRAL TENDENCY.doc
AbedurRahman5
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdfUnit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
AravindS199
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
Habibullah Bahar University College
 
PRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptxPRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptx
IftekharJahanTanjil
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
Leonardo Auslender
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
Aarti Vijaykumar
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursing
windri3
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.ppt
NievesGuardian1
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviation
Adebanji Ayeni
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variationleblance
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
AI Summary
 
Measures of central tendency mean
Measures of central tendency meanMeasures of central tendency mean
Measures of central tendency mean
RekhaChoudhary24
 

Similar to 2 ueda (20)

2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
Ch 5 CENTRAL TENDENCY.doc
Ch 5 CENTRAL TENDENCY.docCh 5 CENTRAL TENDENCY.doc
Ch 5 CENTRAL TENDENCY.doc
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdfUnit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
PRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptxPRESENTATION ON BUSINESS_MATHEMATICS.pptx
PRESENTATION ON BUSINESS_MATHEMATICS.pptx
 
Data analysis
Data analysisData analysis
Data analysis
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
statistics
statisticsstatistics
statistics
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursing
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.ppt
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviation
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Measures of central tendency mean
Measures of central tendency meanMeasures of central tendency mean
Measures of central tendency mean
 

More from Leonardo Auslender

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
Leonardo Auslender
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
Leonardo Auslender
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
Leonardo Auslender
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
Leonardo Auslender
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
Leonardo Auslender
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
Leonardo Auslender
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
Leonardo Auslender
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
Leonardo Auslender
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
Leonardo Auslender
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
Leonardo Auslender
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Leonardo Auslender
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
Leonardo Auslender
 
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-074 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07
Leonardo Auslender
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
Leonardo Auslender
 
4 meda
4 meda4 meda

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
 
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-074 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
 
4 meda
4 meda4 meda
4 meda
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

2 ueda

  • 1. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-19/17/2019
  • 2. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-29/17/2019 The univariate approach is based on the analysis of: 1. Central tendency: mean, median, mode, counts of missingness and non-missingness. 2. Dispersion: standard deviation, inter-quartile range, range 3. Distribution: histogram (or density estimate), quantile plot, cumulative distribution function, table of relative frequencies 4. Exploring variables one by one. 5. Variables can be continuous (interval based) or nominal (will concentrate on binaries). 6. Statistical Inference (part of all of EDA and modeling).
  • 3. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-39/17/2019 Quick Definitions that you should know even in your sleep. i Variable X, n observations. 1 (X)= X X n 50th percentile (median): If n even, Med(X) = average of two central sorted values of X. If n odd, Med(X) = central value of sorted values of X Mean Variance . (X) = =  2 i i i 1 Var (X) = (X X) (n 1) Standard Deviation (X) std(X) Var(X) Standard Error of mean (X) std(X) / n Range Max(X) Min(X) Median Absolute Deviation (MAD) = med | x - med(x )| Inter-Quartile Range = 75th percent − − = = = = −  ile - 25th percentile Mode: Most frequent value (more useful for nominal variables). With so many measures of central tendency and dispersion a variable is distributed along many values, usually graphe  d ==> are there distributions that usually resemble or describe them?
  • 4. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-49/17/2019 Additional Measures Harmonic mean, used to average rates. Let x1 ,,, xn be positive numbers (H tends strongly to min (xi, mitigates effect of large outliers ad enhances effect of small outliers). Used in finance for time series data, e.g., P/E data). n i 1 1i n 1 n 1 n i 1 i 1 n 1 xn1 1 1H( , , , , , ) ( ) x x avg(x , , , x )1 n x x , , , , x 0 = − = = = =   
  • 5. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-59/17/2019 Additional Measures Geometric mean, for any x1,,,,, xn. Used to average growth rates. n 1 n 1 nG(x ,,,,x ) x ,,,,x H G Avg =   Example: Let {x} = (1, 2, 3). H = 1.636 < g = 1.817 < mean = 2 Example: Let {x} = (1, 1/2, 1/3). H = 0.5 < g = 0.55 < mean = 0.61
  • 6. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-69/17/2019 Some Basic Definitions: Univariate Distributions. Skewness: opposite of symmetry. Measures direction and degree of asymmetry: zero indicates symmetrical distribution. Positive value indicates right skewness (long-tailedness to the right) while negative value indicates left skewness. Perfectly symmetrical, non-skewed, distribution: mean, median and mode are equal. Positive skewness: mean > median ➔ most values < mean. And opposite for negative skewness. For instance, store sales are typically skewed. Positive Skewness.
  • 7. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-79/17/2019 If data distribution is symmetric mean = median = mode If positively skewed mode < median < mean If negatively skewed mean < median < mode Symmetry and measures of central tendency
  • 8. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-89/17/2019 Heaviness of tails of distribution. Usual reference point is normal distribution. If b2 = three (g2 zero) and skewness = zero, distribution is normal. Uni-modal distributions that have kurtosis > 3 have heavier tails than normal. These same distributions also tend to have higher peaks in center of distribution. Uni-modal distributions whose tails are lighter than normal distribution tend to have kurtosis < 3. In this case, peak of distribution tends to be broader than normal. 2 2 Kurtosis = Peakedness.Two most frequently used measures, Pearson's b and Fisher's g 4 , 2 2 2 ( 1)( 1) 3( 1) [ ] 2 2( 2)( 3) 1 m b m n n n g b n n n = + − − = − − − +
  • 9. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-99/17/2019 Some Basic Definitions Kurtosis: peakedness. Excess Kurtosis = leptokurtic:, e.g., Stock price returns (Bollerslev- Hodrick, 1992). Normal
  • 10. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-109/17/2019 Homework and Interview Questions. Since the mean estimation divides by n, why is the variance divisor (n - 1), and not ‘n’, or (n – 2), or (n + 1) of sqrt (5)? Do some reading, no need for mathematical proof. Why do we work ‘squaring’ (e.g., variance) and not straight absolute values, for instance?
  • 11. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-119/17/2019
  • 12. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-129/17/2019 DS1: Study Measures of Central Tendency and dispersion. Note median = 0 and MAD = 0 for No_claims. Can you explain it? Basics and Measures of centralitity # Nonmiss Obs % Missing Mean Median Mode Variable 5,960 0.00 8.941 8.000 9.000 DOCTOR_VISITS MEMBER_DURATION 5,960 0.00 179.615 178.000 180.000 NO_CLAIMS 5,960 0.00 0.406 0.000 0.000 NUM_MEMBERS 5,960 0.00 1.986 2.000 1.000 OPTOM_PRESC 5,960 0.00 1.170 1.000 0.000 TOTAL_SPEND 5,960 0.00 18,607.970 16,300.000 15,000.000 Measures of Dispersion Variance Std Deviation Std of Mean Median Abs Dev Nrmlzd MAD Variable 52.31 7.23 0.09 5.00 7.41 DOCTOR_VISITS MEMBER_DURATION 6,736.56 82.08 1.06 57.00 84.51 NO_CLAIMS 1.16 1.08 0.01 0.00 0.00 NUM_MEMBERS 0.99 1.00 0.01 1.00 1.48 OPTOM_PRESC 2.74 1.65 0.02 1.00 1.48 TOTAL_SPEND 125,607,617.29 11,207.48 145.17 6,000.00 8,895.60
  • 13. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-139/17/2019 Too quick a note on missing values. In previous slide, no variable has missing values, A VERY RARE EVENT. Typically, all large data bases have missing values, even in small percentages. Since most software operates on ‘full’ rows, i.e., existence of just one missing point in any variable in the row, missingness propagates quickly (full detail in MEDA under missing values). Thus, UEDA can proceed to obtain measures of central tendency and variation except when missingness is 100% for a specific variable. But BEDA already can suffer tremendously. ADVICE: Find out Missings by UEDA first. Then decide whether to impute (see MEDA section later on) or delete observations (try hard not to). Then, continue your analysis and even modeling.
  • 14. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-149/17/2019 proc univariate data = &indata. (keep = &vars. ) CIBASIC CIPCTLNORMAL ROBUSTSCALE normal ; %PUT; histogram &vars. / normal (color = black w = 7 l = 25) kernel (k = normal c = 0.2 0.5 0.8 1 color = green w = 5 l = 1) ; inset nmiss = "# missing" (5.0) n = "N" min = "Min" mean = "Mean" median = "Median" mode = "Mode" max = "Max" (6.3) Normal kernel (type) ; Run; Quit; SAS code for next slide of variable distributions: Histograms.
  • 15. Leonardo Auslender –Ch. 1 Copyright 2004
  • 16. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-169/17/2019
  • 17. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-179/17/2019
  • 18. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-189/17/2019
  • 19. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-199/17/2019 Important note on nominal variables. Also called categorical variables. Categories may denote our own values and not realities. Defining races as Black, White, Hispanic, Asian, etc implies considering Chinese and Indonesian to be the same. Instead, red and blue color are distinct realities ➔ we use own values to create these categories. Problem is  we tend to create hypotheses, variables, conditions, testing environments, and of course conclusions from these constructs that may be arbitrary. E.g., marketing segment assignment places you in Hispanic group because you learned Spanish in high school and speak it somehow, when you’re not Hispanic. ➔ Given our creation of intended hypotheses, categories, and conditions of data gathering, match representatives of different ‘races’ to reach conclusions that may be plagued with errors due to category construction.
  • 20. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-209/17/2019 Not all Vars shown.
  • 21. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-219/17/2019
  • 22. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-229/17/2019 Why transform? Because transformed variables may be skew free (skewness obviously affects Variance estimation), closer to ‘normality’ (If needed or wanted), provide rank information. All these issues are strongly related to statistical inference and modeling. Transformations can be multi-variate (e.g., Principal components, etc.) as well. More importantly, if data is used in actual practice in transformed manner, then transformation may be advisable. E.g., many medical decisions (e.g., prostate cancer and PSA) are based on thresholds on PSA counts. Do not transform while modeling, do transform when reporting and presenting results. Continuous variables differ in range (min, max) and spread: often convenient to homogenize them to the same units for comparison. Centering: Given variable x, for each observation, subtract the mean value. Resulting variable has mean 0. Note that distribution is not centered (i.e., more symmetrical) unless mean = mode.
  • 23. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-239/17/2019 Standardization: given variable x, for each observation, subtract the overall mean and divide by the standard deviation. Mean removal is called ‘centering’, and the standardized variable measures how far In units of std each obs is from 0, because the new mean is 0 and the new std is 1. NOTE: Standardization is not equivalent to normalization. Log Transformation: apply log (X), for X > 0. If X has negative values could instead do log (X + min (X) + 0.001). PROBLEM: If modeling situation to implement method in future data sets, Min (X) of Original data set may be different from min (x) of future dataset. Binning: Many different methods. Popular one is divide range into # of equal sized sub-ranges. # could be 5 or 10 but application dependent. Method seriously abused.
  • 24. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-249/17/2019
  • 25. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-259/17/2019
  • 26. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-269/17/2019
  • 27. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-279/17/2019
  • 28. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-289/17/2019
  • 29. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-299/17/2019 Comments on Transformations. Shown transformations are monotonic. A ratio transformation ((1 / x), x ne 0) is reverse monotonic, as –x is as well. HW question: Consider transformations of variable Doctor_visits in previous slides. Compare them and report.
  • 30. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-309/17/2019
  • 31. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-319/17/2019 Probability Classical (a priori) Definition: If Event E can occur in “k” ways out of total of “n” equally likely possible ways, then probability p is defined as p = k / n. Frequency (a posteriori) definition: If event E happens k times after n repetitions, then p = k / n. Implicitly assumes equally likely and independent repetitions. Both definitions have serious flaws because they are circular, since “equally likely” is same as “with equal probability” and probability has not yet been defined. The a-posteriori definition is deficient because it does not specify required number of repetitions. Still, definition has intuitive appeal for the case of experiments that are repeatable and for which number of possible outcomes is available. Besides, property of symmetry is usually hiding: each outcome is equi-probable, which of course, raises issue of not having defined probability when we are already using the notion of equi-probability. The subjectivist definition of probability: it is the degree of belief in the occurrence of event E by an individual, which of course, can be different from others’ beliefs.
  • 32. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-329/17/2019 Axioms of Probability *** 1) For any event E, P(E) >= 0 2) For the certainty event S, P(S) = 1 3) For any two disjoint events, E and non E, we have P (E U non E) = P (E) + P (non E), and similarly for an infinite sequence of disjoint events. Theorems on probability spaces *** 1) The impossible event, the empty set ∅ , has probability 0. 2) For any event A, the probability of its complement if P(A complement) = 1 – P(A) 3) For any event A, 0<= P(A) <= 1 4) If A C= B, then P(A) <= P(B) 5) For any two events A and B, P(AB) = P(A) – P(A B) 6) For any two events P (A U B) = P(A) + (PB) – P(A B) 7) For any events A, B, C P (A U B U C) = P (A) + P (B) + P (C) – P (A B) – P ( A C) – P(B C) + P (A B C)
  • 33. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-339/17/2019 Beware of unconditional versus conditional probability interpretations.
  • 34. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-349/17/2019 Example of conditional versus unconditional statements. (from https://articles.mercola.com/sites/articles/archive/2018/07/31/ effects-of-cellphone-radiation.aspx?utm_source=dnl&utm_medium=email&utm_content=artTest_B1&utm_campaign= 20180725Z1_UCM&et_cid=DM223789&et_rid=375100020) “Of 326 cellphone safety studies, 56 percent found a biological effect from cellphone radiation while 44 percent did not. When funding was analyzed, it was discovered that 67 percent of the funded studies found a biological effect, compared to just 28 percent of the industry-funded studies. This funding bias creates a perceived lack of scientific consensus.” That is, Pr (Effect) = 56%. Pr (Effect / No funding) = 67% Pr (Effect / Funding) = 28% Notice that while conditional effects involve at least two variables (e.g., effect and funding), resulting analysis is univariate on Effect. Application: Conditional probability is basis of Associations Analysis, heavily used in marketing.
  • 35. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-359/17/2019 Probability obfuscation. O. J. Simpson’s case. By law, Pr (Simpson not guilty of killing wife without further evidence to contrary) = 100% (later on called Null Hypothesis). Prosecutor argues that defendant battered his wife. Defense lawyer Dershowitz provides following counterargument about relevance of battering: 4 million women are battered annually by male companions in US. In 1992, 1,432, or 1 in 2,500, women were killed by them ➔ women batterers seldom murder their women. But, Probability that a man who batters his wife will go on to kill her is not relevant info. Rather probability that a battered wife who was murdered was murdered by her abuser. And that number is 90%. Fact that she was murdered already, should be part of probability computation.
  • 36. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-369/17/2019 1 in 2500 battered Murdered. Murdered by Male companions.
  • 37. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-379/17/2019 Data Distributions. Univariate distributions take many forms. We focus on some visual attributes, analyzed in reference to central points and ‘tails’ of the distribution. The Normal distribution is the most used and abused, with shape: (https://www.kdnuggets.com/2018/07/explaining-68-95-99-7-rule-normal-distribution.html) 68% within 1 std, 95% within 2 std, 99.7% within 3 std. Distribution of most observations hover around the mean (= median, = mode). Odds of deviation from this average (or chances of a value being different from the medium) decline at an increasingly faster rate (exponentially) as we move away from the average. Tails are thin. Mean and variance fully define Normal distribution.
  • 38. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-389/17/2019 Normal Distribution. When mean = 0 and variance = 1, N becomes standard normal distribution, denoted N (0, 1).
  • 39. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-399/17/2019 LAW OF LARGE NUMBERS (LLN, Bernoulli) As sample size (n) increases, sample mean converges to population mean, if latter exists. (For instance, sampling from Cauchy distribution does not converge to anything because Cauchy does not have mean. However, Cauchy sample medians converge to population median (Blume & Royall 2003)). In typical casino setting, LLN assures casino that it will not lose any money as long as bets keep on coming, It does not assure that at every specific bet, casino will make money. Conclusion: Either skip the casino, or own one.
  • 40. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-409/17/2019 Roulette illustration. There are 36 colored slots, plus 2 non-colored, 0 and 00. if you bet $1 on slot 20 and you win, you get back a total of $36, for $35 of profit. Else, you lose $1. Prob (winning) = 1/ 38, prob (losing) = 37/38. Expected value: 1/38 (35) – 1 (37/38) = -0.0526. On average, every time you bet on roulette $1, you lose a bit more than 0.05. ➔Casinos know that LLN is on their side and for large number of bets, they make 5% on average. If you play 38 times, on average you win once. That is, you pocket $36 and paid out $38. Loss = 2 / total spent = 0.0526. If 1000 people bet this way 35 times, about 610 will have won at least once (61%), but ‘winners’ earnings will be less than lossers’ losses.
  • 41. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-419/17/2019 CENTRAL LIMIT THEOREM CLT (simplest form). Distribution of sum of independent random variables, with common distribution with finite mean and variance, approaches normal distribution ➔ sample mean tends to N. Note: convergence is to distribution, not to specific number. (from bwgriffin.com/gsu/courses/edur8131/content/EDUR_8131_CLT_illustrated_one_page.pdf) SAT data: Figures 2 through 9 show histogram, not of raw SAT scores, but of means from samples of differing sizes. Figure 2, for example, shows means taken from a sample size of 2. To construct Figure 2, a total of 5,000 samples (n = 2 for each sample) of SAT scores were taken from those SAT scores displayed In Figure 1. For Figure 3, another set of 5,000 samples was taken from SAT scores, but with a sample size of 3. Each successive figure shows distribution of sample means for varying sample sizes. Thus, Figures 2 through 9 are histograms of sampling distributions for the mean. Note that as sample sizes increase, the shape of the sampling distribution of means approaches a normal curve and looks less and less like the bimodal distribution of raw SAT scores. This is exactly what the central limit theorem predicts.
  • 42. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-429/17/2019
  • 43. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-439/17/2019 Importance of Normal Distribution. Many statistical inferential tests (to decide whether product A is different than B, patient A benefits more from drug A than B, etc.) rely on normal distribution, mostly via the CLT. This does not imply that all data sets or individual or groups of variables are normally distributed. In fact, most variables, individually or viewed multi-variably, are not normally distributed. For instance, when modeling linear regressions, usual assumption is that error component is N (0, 1) and we test this hypothesis on the residuals. In many other areas, there are tests available to test many hypotheses. But in many circumstances, it is possible to transform the variables to normality, and sometimes desirable (un-reviewed in this class). Height distribution of humans is NOT normal but binomial (due to gender), no matter sample size.
  • 44. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-449/17/2019 Bird’s eye view of probability distributions datasciencecentral.com/profiles/blogs/common-probability-distributions-the-data-scientist-s-crib-sheet
  • 45. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-459/17/2019 Counting and Probability Homework 1) Toss 5 fair quarters once, H and T. How many possible outcomes are there? 2) You flip a fair coin 8 times and obtain H H H H H H H T. What’s the probability of obtaining 1 T and 7 H in 8 flips? 3) Flip a fair coin thrice without noting the result. A friend insists that the second flip was T. What’s the probability that the first flip was H?
  • 46. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-469/17/2019
  • 47. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-479/17/2019 Statistical Inference – Some Basic Definitions Sample (n) and population sizes (N) *** Sample distribution determined by ‘n’, NOT by n/N (sample proportion). Population observations must be independent of each other. Inference always on samples, sample determination is huge topic. N = pop. size, n = sample size. 1) For given N, there are (N choose n) different samples. 2 cases: Cases of N = ∞, (e.g., universe of coin flips). N < ∞ (attendees at present lecture). 2) For given N, sampling with increasing n generates ‘thinner’ distributions, i.e., smaller spread. 3) For constant n/N, and n and N increasing in same proportion, sample spreads vary dependent on ‘n’. 4) For given n, increasing N does not change much sampling distribution. 5) Most statistical formulae divide by ‘n’, not by N. 6) For case of multivariate samples, e.g., Giga bases, ‘n’ is related to ‘p’ (number of variables) with some heuristics, based on computer resources, specifics of analytics and rules of thumb.
  • 48. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-489/17/2019 Statistical Inference – Some Basic Definitions. Curse of Dimensionality in high dimensions *** Consider variables X, Y, Z uniformly distributed U (0, 1). If choose 10% sample for X alone, expected edge length = .1. I.e., 10% X sample covers 10% of range. But if want to still have 10% range with X, Y sampling, needs sample proportion of .1 ** (1 /2) = .32. If X, Y, Z for 10% range, needs 46% sample proportion (formula = .1 ** (1 / p). ➔ Adding dimensions (i.e., variables) ➔ higher sampling proportion to avoid sparseness on the data to analyze. In fact, studying further dimensions for given sample size ➔ true aspects of data (especially in modeling) are not represented in sampled data. ➔ If n suffices for 1 dimension variable, n **2 required for 2, n ** 3 for 3, etc. But there are not enough data points in the universe for very large p. Sampling needs grow exponentially with dimensions.
  • 49. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-499/17/2019 Statistical Inference – Some Basic Definitions Probability Intervals for Sample Mean . Let Z  N(0, 1), known mean and variance, sample repeatedly from this distr. The 95% probability of range of sample means likely to be observed is: (1.96 is 97.5th percentile of normal distribution, μ = 0, σ2 = 1). Expression in absolute brackets is called probability interval. For fixed n, and assuming known  and σ, sample mean will be in that interval with very high probability. Note, repeated samples of size n and Z ~ N (0, 1). For non-normal Z and large n, probability is approximately equal to .95 thanks to CLT, and sample mean is approximately normal. 1.96 (| | 1.96) (| | 1.96) (| | ) .95 /     −  =  = −  = n n X P Z P P X n n
  • 50. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-509/17/2019 *** In more realistic setting, unknown μ (assume known σ for ease), derive from probability interval method to obtain confidence interval for unknown parameters. Emphasis is on the method, which does not depend on having observed any data. Interval is random because is random. The interval is fixed however for given sample. Once calculated, interval is called “confidence interval” (CI) at (1 – α)% level and is not random ➔ that any other sample derived from the same population will generate a mean that will lie within the just found confidence interval or not. ➔phrase “Confidence Interval contains  95% of the time” is incorrect. Instead, CI is “set of parameter values best supported by the data at the 95% level”, denoted as 1 – α, α = .05. ➔Bayesian ‘credible intervals’ answer the question.
  • 51. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-519/17/2019 Most basic test, one-sample test. Want to test whether parameter beta1, usually “mean”, is equal to specific value, for instance, 0. Take sample, obtain mean estimate (and CLT, etc.). Is mean estimate close enough to 0 so that we cannot deny that beta1 = 0? Formally: H0: null hypothesis, beta1 = 0, H1: alternative hypothesis, beta1 not zero. Since std is estimated, distribution is NOT normal, but Student’s t, with n – 1 degrees of freedom (from a table). t approaches Normal distr. asymptotically, very quickly. ˆ ˆ ˆ : parameter estimate  1 o 1 1 1 1 1 β 1 H :β = 0 H :β 0. and the test is : β - β ~ t(n- 1) s β
  • 52. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-529/17/2019 Nomenclature of Hypothesis testing. Level of significance of test called alpha (α) to make a statistical decision about the null hypothesis. It represents the probability of incorrectly rejecting Hₒ when it is true. Typically 5% but not mandatory. Also called type I error. Type II error, or β, probability of not rejecting H0 when it is false. In general, no effort is made in finding out type II error, called Power analysis, that works with 1 – β, probability of correctly reject null hypothesis. p-value: probability, assuming H0 is true, of observing a result equal to or more extreme than what was actually observed. If p < α, this suggests data is inconsistent with H0 and it may be rejected.
  • 53. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-539/17/2019 Some clarifying notes If beta1~ N (mu, sigma2) (appealing usually to CLT, normality of observation distribution does not need to be normal), then standardizing ➔ beta1 mu ~ N(0,1), sigma or if sigma2 is estimated then beta1 mu ~ t(n 1) sigma / n 1 − − − − If standardized beta1 ~ N(0,1), then 95% of likely values are in [-1.96; 1.96] interval ➔ beta1 1.96 * sigma / n 1 is 95% confidence interval  − Full equivalence between CI and p-value. Next: testing Null: beta0 = 0, vs. alternatives. If data yields beta1 = 1, cannot reject null. If beta2 = 2.5, null rejected.p-value < 0.05.
  • 54. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-549/17/2019 P-value For beta2.
  • 55. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-559/17/2019 Examples of inferential tests: 1) IQ change differences between 1st and 3rd elementary school children? (i.e., change = 0?) 2) Are Republican voters more affluent than Democrat ones? 3) Is fertilizer A better than fertilizer B? 4) Is treatment (drug, vaccine, etc) good at treating disease? (extreme heavy use of statistical inference) 5) DNA testing for crimes, paternity tests, etc. Used in almost every aspect of science.
  • 56. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-569/17/2019 Which one is Null Hypothesis? Strong tendency to nominate as Null whichever scheme that is statistically easier to compute. E.g., in linear models (next chapters), Null is lack of predictor effect, and focus is on inference on estimated parameter value. But suppose question: Is there life in other planets? Should Null be NO and Alternative YES? Or Reverse? In drug testing, should Null be NO ADVERSE EFFECT and reverse for Alternative? But this puts burden of proof on Alternative. Why not reverse it and then Null: There are Adverse Effects and burden is on disproving it. EG.: Testing screw on bridge. A: If it breaks, not much happens. But in B:break-up ➔ serious accident. In A, Null is screw is safe, in B, unsafe.
  • 57. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-579/17/2019 P-values. Notion of p-values is central to statistical inference. P-value is probability of seeing outcome (from data or experiment) under assumption that it originated from starting hypothesis (called null), which could be in dispute but is still prevailing view. E.g., flip assumed ‘fair’ coin 4 times, 50% probability of either Heads or Tails, and obtain 4 tails (T). Question: is coin fair (i.e., prob (H) = .5)? Under assumption p = .5, prob ( 4 tails / p = 0.5) = .54 = .0625, i.e., p-value is .0625. Is probability small enough to indicate that we do not believe that p = .5?. For instance, we could flip coin 100 times in sets of 4 to record how many sets contained 4 tails and compare that proportion to .0625. The typical threshold to reject the null hypothesis is 5%. In this example, we fail to reject the null and that is all that can be said. If we had set the threshold at 10% instead, we would judge the coin to be biased. ➔ p-value: probability of obtaining value of experiment, assuming that null hypothesis holds. I.e., prob obtaining extreme value of experiment just by chance. If p-value low (<= 5%?) ➔ null hypothesis should be abandoned, but we don’t know in favor of what specific alternative.
  • 58. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-589/17/2019 More on p-value It is NOT “prob (H0) is false”, because H0 is 100% true by assumption. Remember, p-value is conditional on data at hand, H0 unconditional, p-value = 1 – Pr (H0/ data). Thus, we get Pr (H0/Data). If p-value low, method cannot evaluate whether NULL is true and sample was unusual, or NULL might be false. Assume comparison of new marketing campaign results compared to established one, and p-value = 3%. Interpretation: assuming that campaigns results are similar, obtained difference in 3% or more studies due to sampling error. It is INCORRECT to say that if we reject NULL, there is 3% chance of making mistake. P-value ≠ [Prob (error rate) = Prob (reject true H)] For relationship between p-value and error rates see SELLKE et al (2001): Calibration of p Values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1
  • 59. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-599/17/2019 T-tests: one way, Paired, 2 groups. Test can be one-sample, e.g., comparing variable mean to specific value to decide on POPULATION mean μ . Example (DS1): Is mean of member_duration = 150? H0: mean = 150, H1 mean_duration ≠ 150. Since p-value < 0.05 (our threshold) ➔ mean is not 150. Paired: Could be testing math and reading scores on students, thus data is dependent on each subject. Or frontal left and right tires, data is paired within car. (no example provided). 0 0 0 1 1 2    − − =  − =   ( ) : , : , : , 2-sided test. | | : t-score: compared to t if t t-value reject H ("significant finding", else fail to reject H. n H x A x x estimated mean x x test s n
  • 60. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-609/17/2019 T-tests cont. Testing across 2 groups. Compare same variable across two groups, say male vs female, young vs old, treated patient vs. non-treated, etc. H: (and alternative, that they are different from zero.) Have 2 variances now. Issue: whether variance of each group is equal to each other. If equal, use weighted average of the two (pooled variances). Otherwise (unpooled), Satterthwaite’s method provides asymptotic degrees of freedom computation. DS1 example: compare means of member duration by fraud level. 2 0  − = 1 2 1 2 1 1 p x x t s n n − = + 2 2 1 2 1 2 s s Unpooled Std Error n n = +
  • 61. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-619/17/2019 One way 2 groups
  • 62. Leonardo Auslender –Ch. 1 Copyright 2004 Reject H0 in all cases. Inference Tests. t Value P- value DF Variable model_name Method 13.49 0.00 5958.00member_duration TTEST_2GROUPS Pooled Satterthwaite 14.36 0.00 1982.49 TTEST_ONEWAY 154.84 0.00 5959.00 SAS CODE ODS LISTING CLOSE; ods output ttests = ttests statistics = statistics; %put; proc ttest data = &INDATA. var &groupvar.; ; class &classvar.; run; ods listing;
  • 63. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-639/17/2019 Important comments on NHST (Null Hypo. Stat. Testing) Is “significant finding” important, or substantial? Not given by statistical testing but by expert knowledge of event at hand. Notice that by increasing sample size ‘n’, t-test denominator decreases ➔ higher chances of finding ‘significance’, but finding may be irrelevant. But more than this: We are testing When we want NHST fails to reject H0 because it is not being tested. Example: Valid reasoning Invalid reasoning 1) All persons are mortal. 1) Most women don’t like soccer. 2) Socrates is a person. 2) Mia Hamm played soccer 3) ➔ Socrates is mortal. 3) Mia Hamm is not a woman. The invalid reasoning conclusion 3) is a false negative due to ‘most’ in 1) which is Ho. The test part is given by 2) and, accepting that 1) is true, concludes 3) which is wrong. 1 1 0 = ˆPr( / )H 0 1  = ˆPr( is true / )H
  • 64. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-649/17/2019 important comment on present practice of NHST Strong tendency to become priests and adorers of <= 5% p-value, regardless of whether findings are of practical importance. Strong tendency to search, transform, manipulate to obtain sacred <= 5% number. Strong tendency to underreport results that don’t reach 5% p-value. Strong tendency to disregard Type I vs. Type II error (see next slide). NHST mostly focuses on TYPE I error. 5% p-value is not part of Constitution, Magna Carta, Ten Commandments, Koran or any other important document. And worse: present practice, p-hacking (or data-dredging- snooping-fishing, cherry-picking….).
  • 65. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-659/17/2019 Type I vs Type II errors. From https://heartbeat.fritz.ai/classification-model-evaluation-90d743883106
  • 66. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-669/17/2019
  • 67. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-679/17/2019 1) Focus analysis just on data subset where interesting pattern was found. 2) Disregarding multiple comparisons adjustments, and non- Reporting of non-significant results. 3) Using different tests (eg., parametric vs. non-parametric) on same hypothesis and only reporting significant results. 4) Removing ‘outliers’ to prove given hypothesis, or choosing Data points to obtain significance. Or dropping variables because of imaginary or not problems (e.g., co-linearity in linear models). 5) Transforming data, especially when modeling, for significant discoveries. How bad is it in applied work?
  • 68. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-689/17/2019 P-hacking prevalence: almost universal except for saints (see Berman, 2018 for p-hacking in marketing). Note that p-hacking is based on using statistical inference. Methods that do not use statistical inference are not affected. Also, p-hacking can be understood as inferential EDA and thus requires additional data sets to validate the just found hypotheses. Data Science typically works by splitting the (enormous) data set in at least 3 datasets (typically randomly): -Training: Find hypotheses, models. -Validation: Search hypotheses without using inference, without using model search and obtain estimates. -Testing: Verify hypotheses if validated. Use prior estimates to obtain final model results.
  • 69. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-699/17/2019 But …………. Just validating and testing does not ensure full verification. Further techniques: Cross Validation (web, see reference at end of presentation.) Bootstrapping – jacknifing (1979)
  • 70. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-709/17/2019
  • 71. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-719/17/2019 Multiple comparisons *** Assume not one but 20 hypotheses tested on same data set with α = 5%. While prob rejecting H0 when H0 true is 5%, prob (at least one significant result out of 20) = 64% = 1 – P (no overall significance) = 1 – (1 – 0.05)20. Different methods to handle it: Bonferroni, false Discovery Rate (FDR), positive FDR (pFDR), etc. Present unfortunate practices: 1) Very common to just focus on significant findings and disregard presenting all information on non-significant findings. Example: study on opera singing and sleeping posture finds singers tend to sleep in specific position. But study omits that most comparisons were insignificant, from diet differences, sleep medication usage, gender, sex, etc. 2) Very common to just report on what is convenient. Company tests new drug efficacy on 2 different groups of people and results are positive for one but negative for other. Company reports just positive result.
  • 72. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-729/17/2019
  • 73. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-739/17/2019 Sampling practices Sampling seldom done with replacement ➔ sample size should be large enough to: 1) Diminish probability of spurious dependence across observations. 2) Ensure that rare categories in nominal variables are still represented. Essentially, sampling w/o replacement in small samples ➔ proportions are distorted. 3) Provide enough observations so that overall cardinality of the data set is still representative of the population. Under these conditions, rule of thumb is to sample %10 of the population at least. Bear in mind that for very large p / n (undefined), the percentage should be higher.
  • 74. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-749/17/2019 Sampling. Often entire population data too large or expensive to obtain. E.g., population data on US males average weight in December 2016. In other cases, study sample to analyze results expected in future populations. E.g., population of credit card payers in future years. Expectation is that future population properties represented in present sample data. Two big types of sampling: a) probabilistic – random sampling. b) non-probabilistic
  • 75. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-759/17/2019 Random Sampling Simple Random Sampling: equi-probability of selection for each observation. Stratified sampling: random sampling within strata level of one (or few) variable/s, typically categorical. Heavily used in classification methods). Systematic sampling: observations chosen at regular data intervals. Cluster Sampling: sampling within clusters of observations. Multi stage Sampling: mixture of previous methods.
  • 76. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-769/17/2019 Non-random sampling Convenience Sampling: based on data availability. Purposive Sampling: sample only from those elements of interest. E.g., sample those who responded to credit card solicitation. Quota Sampling: sample until exact proportion of certain characteristics is achieved. Referral /Snowball Sampling: sample is created by referrals from participants to participants. E.g., sampling of HIV patients, pick- pokets… Method subject to high bias and variance, and cannot claim statistical significance status of random sampling.
  • 77. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-779/17/2019 Issues in sampling Assume research on bus quality of service, for which you ask questions from people waiting at the bus stop. One main concern for quality of service is late buses. You obtain many answers and report on your results. Problem: people who respond are those more likely to have waited longer than those already on the bus because you sample at bus stop. Thus, answers will be biased toward poor service.
  • 78. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-789/17/2019
  • 79. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-799/17/2019 Normality Check: QQPlots Plots compare order values of variable of interest (Y axis) with quantiles of normal distribution. If pattern is linear ➔ variable is normally distributed. Overlayed reference straight line indicates perfect normality. Description of Point Pattern Possible Interpretation all but a few points fall on a line outliers in the data (later reviewed) left end of pattern is below the line; right end of pattern is above the line long tails at both ends of the data distribution left end of pattern is above the line; right end of pattern is below the line short tails at both ends of the data distribution curved pattern with slope increasing from left to right data distribution is skewed to the right curved pattern with slope decreasing from left to right data distribution is skewed to the left staircase pattern (plateaus and gaps) data have been rounded or are discrete
  • 80. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-809/17/2019
  • 81. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-819/17/2019 Outlier. Right Skew. Addl. Examples
  • 82. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-829/17/2019 Left Skew
  • 83. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-839/17/2019 data hello2; length varname $ 32; set training (in = in1 keep = DOCTOR_VISITS rename = (DOCTOR_VISITS = varvalue) ) training (in = in2 keep = FRAUD rename = (FRAUD = varvalue) ) training (in = in3 keep = MEMBER_DURATION rename = (MEMBER_DURATION = varvalue) ) training (in = in4 keep = NO_CLAIMS rename = (NO_CLAIMS = varvalue) ) training (in = in5 keep = NUM_MEMBERS rename = (NUM_MEMBERS = varvalue) ) training (in = in6 keep = OPTOM_PRESC rename = (OPTOM_PRESC = varvalue) ) training (in = in7 keep = TOTAL_SPEND rename = (TOTAL_SPEND = varvalue) ) ; if in1 then varname = "DOCTOR_VISITS"; if in2 then varname = "FRAUD"; if in3 then varname = "MEMBER_DURATION"; if in4 then varname = "NO_CLAIMS"; if in5 then varname = "NUM_MEMBERS"; if in6 then varname = "OPTOM_PRESC"; if in7 then varname = "TOTAL_SPEND"; LABEL VARNAME = "Var Name" Varvalue = "Variable"; run; PROC UNIVARIATE DATA = hello2 NOPRINT; class varname; VAR varvalue; qqplot / NCOL = 3 NROW = 1;
  • 84. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-849/17/2019 Homework Select your data set and software. Obtain Qqplots for some variable/s and diagnose them.
  • 85. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-859/17/2019
  • 86. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-869/17/2019 The Dow Jones index fell 22.61% in one day in October of 1987. It was a “25- standard deviation event” according to the prevailing calculations. That is an occurrence so rare that if the stock market had been open every single day since the Big Bang… it still shouldn’t happen. And yet it did. Comment on how the standard deviation was possibly derived and how the 25 std event was used to justify that it shouldn’t have happened. And let’s not think about 2007-2009 for the time being.
  • 87. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-879/17/2019 TV game show. Host (Monty Hall) shows 3 doors to contestant. Behind two of the doors, there are goats, behind the other one, a car. If the contestant chooses the right door, he/she wins the car. Twist: Once the participant has chosen a door and before Monty opens it (and Monty knows behind which door the car is located), Monty opens one door that he knows has a goat behind. At this point, Monty offers the participant the chance to switch his/her election to the remaining door. Should the participant switch? Yes? No? Why?
  • 88. Leonardo Auslender –Ch. 1 Copyright 2004 Ch. 1.1-889/17/2019 The immediate reaction is that the initial probability of success is 1/3 and after Monty opened the goat door, the probability is then ½ and thus there is no gain in switching or staying with the original choice. Wrong. Let us call the doors 1, 2 and 3. The following table summarizes it all when door 1 is chosen initially (https://en.wikipedia.org/wiki/Monty_Hall_problem) Thus, the probability of winning if the participant switches is 2/3. Note: This puzzle raised intense and acrimonious debates among statisticians.
  • 89. Leonardo Auslender –Ch. 1 Copyright 20049/17/2019 3. Tea and/of coffee? In a large group of people, it is known that %70 of them drink coffee and %80 drink tea. What is the lower and upper bound of those who like both? (In obvious notation C and T) We know that total probability is: ( ) ( ) ( ) ( ) 1 .7 .8 1 ( ) ( ) ( ) 0 (everybody drinks something) ( ) .5 ( ) .8 max ( ) .2 ( ) .7 P P T P C P T C P T C P T C P T C If P T C P T C Since P C P T C P T C = + − + =  + = + − =  = =  =  =
  • 90. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-909/17/2019 Go, Bayes Go!! (Under construction) ***
  • 91. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-919/17/2019 Bayesian inference (BI). Example of coin tossing. In BI, establish parameters and models. An important part of Bayesian inference is establishment of parameters and models. Models: mathematical formulation of observed events. Parameters: factors in models affecting observed data. For example, in coin tossing, fairness = parameter of coin denoted by θ. Outcome denoted by D. Q.: What is Pr ( 4 heads out of 9 tosses) (D) given coin fairness, i.e., P(D / θ)? This is frequentist way of looking at problem. More interested in knowing: Given D (4 heads out of 9 tosses), what is Pr (θ ) = 0.5? By Bayes’ Theorem: ( / ) ( ) ( / ) , ( ) P D p p D Theta p D    = =
  • 92. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-929/17/2019 If we know θ, likelihood would give P (D) for any event. p(D) is ‘evidence’, estimated over all possible values of θ (could be more than one). Significance Tests P-value Frequentist: calculate t-score from fixed size sample, obtain p-value . If p-value is 3% for mean = 50, ➔ there is 3% probability that sample mean will equal 50, assume some H. Problem: when n changes for different sample sizes, t-scores change also ➔ 5% threshold is meaningless to decide on whether to reject or not H. ( / ):posterior belief or probafter observingD.p D
  • 93. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-939/17/2019 CIs Cis are p-value cousins ➔ same problem with changing n. Since CI not a probability distribution, impossible to know which values are most probable. Bayes Factor Bayesian equivalent of p-value. Bayesian H assumes “∞” prob distr only at particular value of parameter (θ = 0.5 for fair coin) and zero prob elsewhere. (M1) Alternative hypo: all values of θ are possible ➔ flat curve distr (M2) Next slide shows prior situation (left panel) and posterior situation (right panel).
  • 94. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-949/17/2019 Given 4 heads out of 9, prob. has shifted from 0.5 to something larger given by panel B. In A, left and right bars indicate prob values of H and A. In panel B, H is now < 0.5 and A > 0.5. Bayes factor: ratio of posterior odds to prior odds. Reject H if BC < .1 (suggested) Benefits of using Bayes Factor instead of p-values: no sample size effect. ( / ) ( ) / ( / ) ( ) p H D p H BF p A D p A =
  • 95. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-959/17/2019 Credible interval Bayesian solution yields ‘posterior’ probability distribution function (pdf). Assume it is continuous. Then for any two points (a < b) in range of pdf, it is possible to assert Pr (a <= X <= b) = c. If we invert the problem and ask what are the values a and b that yield a probability c for X in pdf, then [a ; b] is a c% credible interval. There is infinite number of interval that can yield c% probability. If distribution is symmetric, range is taken such that (1 – c/2)% is left out in each tail. With credible intervals, it is possible to ascertain that interval contains true value with c% probability, which is not possible with traditional Cis.
  • 96. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-969/17/2019
  • 97. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-979/17/2019
  • 98. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-989/17/2019 Random events along discrete time index (t) (months, steps, number of tests, etc.), measuring cumulative outcome effect. Time index theoretically goes to infinity. Example: Flip fair die, if even, walk 100 meters north, else 100 meters south, and repeat 100 times.
  • 99. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-999/17/2019 References Berman R. et al (2018): p-Hacking and False Discovery in A/B Testing, web. Cross-validation: https://www.cs.cmu.edu/~schneide/tut5/node42.html Efron B. (1979): Bootstrap methods: Another look at the jackknife“ The Annals of Statistics, 7(1): 1-26 Tim Bollerslev & Robert J. Hodrick, 1992: Financial Market Efficiency Tests, NBER Working Papers 4108, National Bureau of Economic Research, Inc.
  • 100. Leonardo Auslender –Ch. 1 Copyright 2004 1.1-1009/17/2019