SlideShare a Scribd company logo
1 of 7
Download to read offline
AVERAGES AND MEASURES OF SPREAD
· Mode : the most common or most popular data value
the only average that can be used for qualitative data
not suitable if the data values are very varied
· Mean : important as it uses all the data values
Disadvantage – affected by extreme values
· Median : the middle value when the data are in order
For n data values the median is the n + 1th value
Not affected by extreme values
· Range – biggest value – smallest value
- greatly affected by extreme values
· Interquartile Range – Upper quartile – Lower quartile
- measures the spread of the middle 50% of the data and is not affected by
extreme values
IQR = 9 - 4 = 5
IQR = 7.5 – 3.5 = 4
· Standard Deviation
Deviation from the mean is the difference from a value from the mean value
The standard deviation is the average of all of these deviations
Formulas to work out standard deviation are given in the
q SCALING DATA
Addition if you add a to each number in the list of data :
New mean = old mean + a
New median = old median + a
New mode = old mode + a
Standard Deviation is UNCHANGED
Multiplication - If you multiply each number in the list of data by b :
New mean = old mean × b
New median = old median × b
New mode = old mode × b
New Standard Deviation = old standard deviation × b
If the data is grouped – use the mid-point of each group as your x
2
For 10 values the median
will be the 5½ th value –
halfway between the 5th
and the 6th
values
3 4 5 7 8 9 9
LQ M UQ
2 3 4 4 5 7 8 9
LQ M UQ
AQA STATISTICS 1 REVISION NOTES
www.mathsbox.org.uk
PROBABILITY
Outcome : each thing that can happen in an experiment
Sample Space : list of all the possible values
q NOTATION
A and B both happen
either A or B or both happen
C doesn’t happen
§ P(C’) = 1 – P(C)
§
Mutually Exclusive Events two or more events that cannot happen at the same time
Independent Events the outcome of one event does not affect the outcome of another
P( B ) = P(A) P(B)
Conditional Probability : when the outcome of the first event affects the outcome of a
second event, the probability of the second event depends on what has happened
§ P(B/A) means the conditional probability of B given A
P(B/A) = (A B) so P( B )= P(A) P(B/A)
q If the question states that the events are independent then a tree diagram might be a
good idea – multiply along the branches then add the appropriate combinations together
q Probability of ‘at least 1’ = 1 – Probability of ‘none’
q If you are asked to find probabilities using data in a table – work out the row/column
totals before you start
A Ç B
A È B
C' C
P(A È B) = P(A) + P(B) – P(A Ç B)
P(A È B) = P(A) + P(B)
P Ç
P(A)
A Ç
A Ç
www.mathsbox.org.uk
BINOMIAL DISTRIBUTION
q A question is binomial if:
· Probability of an event happening is given (p)
· Number of people/trials/objects chosen given (n)
q EQUALS or EXACTLY use the formula P(x=r)
Make sure you write it out with the values substituted in
q MORE/LESS THAN/AT LEAST use tables
Remember tables give less than or equal to
Make sure you list the numbers and identify which ones you need to include
P(X>5)  0  1  2  3  4  5  6  7  8  9  = 1 – P(X≤5)
P(X<5)  0  1  2  3  4  5  6  7  8 = P(X ≤ 4)
q MEAN and VARIANCE
Mean = np Variance = np(1-p) standard deviation =
q ASSUMPTIONS
Independent events with a fixed probability of success
Randomly selected
q COMPARISONS
You may be asked to calculate the mean and standard deviation of a binomial distribution
and compare them to the mean and standard deviation of a sample (table of results) - if
both the means are approximately the same AND both standard deviations (or variances) are
approximately equal then you can say that binomial model appears to fit the data and that it
must be independent, random observations.
n C r p
r
(1 – p)
n – r
from your calculator make sure
you write down this value
Check that your ‘powers’ add
to make n (number of trials)
np(1 – p)
www.mathsbox.org.uk
NORMAL DISTRIBUTION
FINDING PROBABILITIES
q State the mean and variance (standard deviation)
q Standardise to find the z value
q Sketch a graph and shade in the area to represent the probability
q Use the table to find the probability –
- take care with negative z-values –
P(z > -0.8) = P(z < 0.8)
P(z < -0.8) = 1 – P(z < 0.8)
q ALWAYS CHECK YOUR ANSWER WITH YOUR GRAPH – if your shaded area is more that ½
and your answer is 0.4 (for example) – you know you have gone wrong somewhere!!
WORKING BACKWARDS – to find the mean/standard deviation – or both (simultaneous equations)
q State the probability you know and sketch a graph
P(X < 34) = 0.95
q Use tables to find the appropriate z value
q Write down the equation used to standardise with all of the known values substituted
q Rearrange to find the mean
1.2
z = x – mean
standard deviation
P(z <1.2) P(z >-0.8)
-0.8
-0.8 0.8
-0.8 0.8
0.05
34
Standard deviation = 8
1.6445 = 34 –
8
mean
www.mathsbox.org.uk
SAMPLES and PROBABILITIES
If you are asked to calculate probabilities involving samples remember to divide the
(population) standard deviation by the square root of the sample size when you standardise.
ESTIMATION
– estimating the population mean from a sample mean and finding a confidence interval
q CASE 1 : Standard deviation or Variance of the population is stated in the question
- from the data you only need to calculate the mean
- use your tables to find the appropriate ‘z value’
- write out the above expressions for the confidence intervals with all your values substituted in
- calculate the two values for your confidence intervals and state clearly (3 sf)
CHECK your answer – add the two answers together and divide by 2
– this should be the sample mean!!!
q CASE 2 : Standard deviation or Variance of the population is UNKNOWN
- you will need to use the data to calculate the variance of the sample and then an unbiased
estimate of the population variance
- your calculator will give you value of the standard deviation for the sample you have entered
sxn - square this to calculate the sample variance
An unbiased estimate of the population variance is
Use this as the value of in your confidence interval when substituting the values in
q INTERPRETATION – a 95% confidence interval tells us that if we took the same size
sample 100 times then 95 of the confidence intervals we would calculate should
contain the TRUE population mean.
z =
x – mean
s
n
or z =
x – mean
variance
n
If a random sample of size n is taken from a normal population and the sample
mean x is found, then the 95% confidence interval of the population mean is
given by
where s is either the population variance (if given) or an unbiased estimate
of the population variance (found from the sample see below)
x – 1.96 ´ s
2
n
, x + 1.96 ´ s
2
n
2
Z value
n
n – 1
´ Sample variance
s
2
www.mathsbox.org.uk
CENTRAL LIMIT THEOREM
– you only need to use the central limit theorem if you are not told that the sample is selected
from a population which is Normally Distributed
The theorem concerns the distribution of the sample means and as long as the sample
size is large enough (greater than 30) then the sample means will be normally
distributed – and so we can calculate confidence intervals
REGRESSION – finding the equation of the line of best fit – least squares
y = a + bx
q If you are asked to interpret the values of a and b, make sure you discuss it in the context of the
question NOT in terms of x and y (see examples above)
q If you are give a table of values – use your calculator to find a and b
In your workings state a = …. b =…..
and show your values substituted into y = a + bx
q If you are calculating a and b using the formulae – make sure you use the formula book –
showing how you substituted the values in
Always work out ‘b’ first
Use ‘b’ and the means of x and y to work out a a = (mean of y) - b(mean of x)
q TO PLOT THE REGRESSION LINE – choose 2 different values of x – use your equation y=a + bx
to work out the predicted y-values - plot the two points and join with a straight line
q RESIDUAL
= OBSERVED(actual value) – PREDICTED(using equation y= a + bx)
- the smaller the residuals the greater the accuracy of the line of best fit in predicting values
- sometimes an ‘average’ residual can be used to make predictions using the line of best fit –e.g if
an individual has an average residual of 5 – then to predict for this particular person using the line, 5
should be added to the value predicted using the equation.
q RELIABILITY OF PREDICTIONS
- Interpolation
predicting using an x-value within the range of x-values used to calculate the a and b
considered to be a reliable prediction
- Extrapolation
predicting using an x-value outside of the range of x-values used to calculate a and b
UNRELIABLE estimate
- because you are assuming that the linear trend continues indefinitely – use your
common sense to explain why this may be incorrect
- watch out for NEGATIVE (or unrealistic) y values which may result for the x values
suggested – again use your common sense to explain why this is unrealistic
Intercept - the value of ‘y’ when ‘x’ is zero.
e.g. When the temperature is 0°C ice cream sales
are £10
Gradient – the change in y for each unit change in x
e.g for every 1 degree rise in temperature sales increase by £b
www.mathsbox.org.uk
q SCALING either the x or the y values will change the equation
e.g y = 0.5 – 1.2x
If the x values are doubled then the equation becomes y = 0.5 – 1.2(2x)
y = 0.5 – 2.4x
If 5 is added on to each of the y values then the equation becomes
y + 5 = 0.5 – 1.2x
y = - 4.5 – 1.2 x (always rearrange to get y = ‘a’ + ‘b’x)
CORRELATION
The Product Moment Correlation Coefficient : is a numerical measures of the strength and type of
correlation – denoted by r and will lie in the range -1 £ r £ 1
q indicates how well the data, when plotted in a scatter graph, fits a straight line pattern
q NOT APPROPRIATE if the data does not follow a linear pattern when plotted (straight line) – so
scatter graph is needed to check this
q If you are give a table of values – use your calculator to find r
(If you have time it’s a good idea to check that you have entered your values correctly)
q If you are calculating using summary values -make sure you use the formula book – showing
how you substituted the values into the formula
INTERPRETING r – make sure you do this in the context of the question (not just positive correlation)
e.g. There appears to be a fairly strong relationship between temperature and ice cream sales, higher
temperatures appear to correspond to higher values of ice-cream sales and vice versa.
q Scaling data – a linear transformation or scaling of one or both of the variables will not affect the
correlation coefficient – all of the points will stay in the same position RELATIVE to each other
TAKE CARE
q Not all correlation will be linear
For this data, the correlation coefficient is close to 0
This does not mean that there is no correlation but simply mean that there is no linear
correlation (pattern appears to be quadratic)
q Spurious Correlation
A strong correlation between 2 variables does not mean that one thing causes the other –high
marks in a maths exam do not necessarily cause high marks in a Statistics exam, they are likely
to both be dependent on a common third variable : the students mathematical ability
q Outliers
One or two outliers can have a dramatic effect on a correlation coefficient
www.mathsbox.org.uk

More Related Content

What's hot

Two Dimensional Motion and Vectors
Two Dimensional Motion and VectorsTwo Dimensional Motion and Vectors
Two Dimensional Motion and Vectors
ZBTHS
 
6.2 newtons law of gravitation
6.2 newtons law of gravitation6.2 newtons law of gravitation
6.2 newtons law of gravitation
Paula Mills
 
05 perfect square, difference of two squares
05   perfect square, difference of two squares05   perfect square, difference of two squares
05 perfect square, difference of two squares
majapamaya
 
6.2 solve quadratic equations by graphing
6.2 solve quadratic equations by graphing6.2 solve quadratic equations by graphing
6.2 solve quadratic equations by graphing
Jessica Garcia
 
Linear equation in two variables
Linear equation in two variablesLinear equation in two variables
Linear equation in two variables
MERBGOI
 

What's hot (20)

Center of mass ppt.
Center of mass ppt.Center of mass ppt.
Center of mass ppt.
 
AS LEVEL QUADRATIC (CIE) EXPLAINED WITH EXAMPLE AND DIAGRAMS
AS LEVEL QUADRATIC (CIE) EXPLAINED WITH EXAMPLE AND DIAGRAMSAS LEVEL QUADRATIC (CIE) EXPLAINED WITH EXAMPLE AND DIAGRAMS
AS LEVEL QUADRATIC (CIE) EXPLAINED WITH EXAMPLE AND DIAGRAMS
 
AS LEVEL COORDINATE GEOMETRY EXPLAINED
AS LEVEL COORDINATE GEOMETRY EXPLAINEDAS LEVEL COORDINATE GEOMETRY EXPLAINED
AS LEVEL COORDINATE GEOMETRY EXPLAINED
 
Square of trinomial
Square of trinomialSquare of trinomial
Square of trinomial
 
Elimination method Ch 7
Elimination method Ch 7Elimination method Ch 7
Elimination method Ch 7
 
Commutative and Associative Laws
Commutative and Associative LawsCommutative and Associative Laws
Commutative and Associative Laws
 
CLASS IX MATHS Coordinate geometry ppt
CLASS IX MATHS Coordinate geometry pptCLASS IX MATHS Coordinate geometry ppt
CLASS IX MATHS Coordinate geometry ppt
 
Unit 6, Lesson 3 - Vectors
Unit 6, Lesson 3 - VectorsUnit 6, Lesson 3 - Vectors
Unit 6, Lesson 3 - Vectors
 
Motion
MotionMotion
Motion
 
Two Dimensional Motion and Vectors
Two Dimensional Motion and VectorsTwo Dimensional Motion and Vectors
Two Dimensional Motion and Vectors
 
6.2 newtons law of gravitation
6.2 newtons law of gravitation6.2 newtons law of gravitation
6.2 newtons law of gravitation
 
1.1 vectors
1.1   vectors1.1   vectors
1.1 vectors
 
Linear equations in two variables
Linear equations in two variablesLinear equations in two variables
Linear equations in two variables
 
05 perfect square, difference of two squares
05   perfect square, difference of two squares05   perfect square, difference of two squares
05 perfect square, difference of two squares
 
6.2 solve quadratic equations by graphing
6.2 solve quadratic equations by graphing6.2 solve quadratic equations by graphing
6.2 solve quadratic equations by graphing
 
Lesson 3: The Limit of a Function
Lesson 3: The Limit of a FunctionLesson 3: The Limit of a Function
Lesson 3: The Limit of a Function
 
Systems of linear equations
Systems of linear equationsSystems of linear equations
Systems of linear equations
 
Lecture 07 graphing linear equations
Lecture 07 graphing linear equationsLecture 07 graphing linear equations
Lecture 07 graphing linear equations
 
Linear equation in two variables
Linear equation in two variablesLinear equation in two variables
Linear equation in two variables
 
Limit of functions
Limit of functionsLimit of functions
Limit of functions
 

Viewers also liked

Unit 2 higher_mark_scheme_march_2012
 Unit 2 higher_mark_scheme_march_2012 Unit 2 higher_mark_scheme_march_2012
Unit 2 higher_mark_scheme_march_2012
claire meadows-smith
 
Unit 3 foundation spec paper_mark_scheme
Unit 3 foundation  spec paper_mark_schemeUnit 3 foundation  spec paper_mark_scheme
Unit 3 foundation spec paper_mark_scheme
claire meadows-smith
 
04a 5 mb2f_unit_2_november_2011_mark_scheme
04a 5 mb2f_unit_2_november_2011_mark_scheme04a 5 mb2f_unit_2_november_2011_mark_scheme
04a 5 mb2f_unit_2_november_2011_mark_scheme
claire meadows-smith
 
06a 1 ma0_1h_mark_scheme_-_june_2012
06a 1 ma0_1h_mark_scheme_-_june_201206a 1 ma0_1h_mark_scheme_-_june_2012
06a 1 ma0_1h_mark_scheme_-_june_2012
claire meadows-smith
 
08a 5 mb2h_unit_2_mark_scheme_–_march_2012
08a 5 mb2h_unit_2_mark_scheme_–_march_201208a 5 mb2h_unit_2_mark_scheme_–_march_2012
08a 5 mb2h_unit_2_mark_scheme_–_march_2012
claire meadows-smith
 
09a 5 mm1f_methods_unit_1f_-_november_2012
09a 5 mm1f_methods_unit_1f_-_november_201209a 5 mm1f_methods_unit_1f_-_november_2012
09a 5 mm1f_methods_unit_1f_-_november_2012
claire meadows-smith
 
12a 5 mb3h_unit_3_mark_scheme_-_june_2012
12a 5 mb3h_unit_3_mark_scheme_-_june_201212a 5 mb3h_unit_3_mark_scheme_-_june_2012
12a 5 mb3h_unit_3_mark_scheme_-_june_2012
claire meadows-smith
 
Unit 2 foundation mark scheme june 2011
Unit 2 foundation mark scheme june 2011Unit 2 foundation mark scheme june 2011
Unit 2 foundation mark scheme june 2011
claire meadows-smith
 

Viewers also liked (20)

1301 c1 january_2013_mark_scheme
1301 c1 january_2013_mark_scheme1301 c1 january_2013_mark_scheme
1301 c1 january_2013_mark_scheme
 
1301 d1 january_2013_mark_scheme
1301 d1 january_2013_mark_scheme1301 d1 january_2013_mark_scheme
1301 d1 january_2013_mark_scheme
 
Unit 2 higher_mark_scheme_march_2012
 Unit 2 higher_mark_scheme_march_2012 Unit 2 higher_mark_scheme_march_2012
Unit 2 higher_mark_scheme_march_2012
 
11 practice paper_3_h_-_set_a
11 practice paper_3_h_-_set_a11 practice paper_3_h_-_set_a
11 practice paper_3_h_-_set_a
 
05a 1 ma0_1h_-_june_2012
05a 1 ma0_1h_-_june_201205a 1 ma0_1h_-_june_2012
05a 1 ma0_1h_-_june_2012
 
03b 5 mb1h_unit_1_-_june_2012
03b 5 mb1h_unit_1_-_june_201203b 5 mb1h_unit_1_-_june_2012
03b 5 mb1h_unit_1_-_june_2012
 
Best guess-paper-2-foundation
Best guess-paper-2-foundationBest guess-paper-2-foundation
Best guess-paper-2-foundation
 
Unit 3 foundation spec paper_mark_scheme
Unit 3 foundation  spec paper_mark_schemeUnit 3 foundation  spec paper_mark_scheme
Unit 3 foundation spec paper_mark_scheme
 
04a 5 mb2f_unit_2_november_2011_mark_scheme
04a 5 mb2f_unit_2_november_2011_mark_scheme04a 5 mb2f_unit_2_november_2011_mark_scheme
04a 5 mb2f_unit_2_november_2011_mark_scheme
 
06a 1 ma0_1h_mark_scheme_-_june_2012
06a 1 ma0_1h_mark_scheme_-_june_201206a 1 ma0_1h_mark_scheme_-_june_2012
06a 1 ma0_1h_mark_scheme_-_june_2012
 
08a 5 mb2h_unit_2_mark_scheme_–_march_2012
08a 5 mb2h_unit_2_mark_scheme_–_march_201208a 5 mb2h_unit_2_mark_scheme_–_march_2012
08a 5 mb2h_unit_2_mark_scheme_–_march_2012
 
09a 5 mm1f_methods_unit_1f_-_november_2012
09a 5 mm1f_methods_unit_1f_-_november_201209a 5 mm1f_methods_unit_1f_-_november_2012
09a 5 mm1f_methods_unit_1f_-_november_2012
 
12a 5 mb3h_unit_3_mark_scheme_-_june_2012
12a 5 mb3h_unit_3_mark_scheme_-_june_201212a 5 mb3h_unit_3_mark_scheme_-_june_2012
12a 5 mb3h_unit_3_mark_scheme_-_june_2012
 
Unit 2 foundation mark scheme june 2011
Unit 2 foundation mark scheme june 2011Unit 2 foundation mark scheme june 2011
Unit 2 foundation mark scheme june 2011
 
Unit 2_higher -_march_2012
 Unit 2_higher -_march_2012 Unit 2_higher -_march_2012
Unit 2_higher -_march_2012
 
08a 1 ma0 2h november 2012 mark scheme
08a 1 ma0 2h   november 2012 mark scheme08a 1 ma0 2h   november 2012 mark scheme
08a 1 ma0 2h november 2012 mark scheme
 
Core 2 revision notes
Core 2 revision notesCore 2 revision notes
Core 2 revision notes
 
3 revision session for core 1 translations of graphs, simultaneous equation...
3 revision session for core 1   translations of graphs, simultaneous equation...3 revision session for core 1   translations of graphs, simultaneous equation...
3 revision session for core 1 translations of graphs, simultaneous equation...
 
01b 1 ma0_1f_-_march_2013
01b 1 ma0_1f_-_march_201301b 1 ma0_1f_-_march_2013
01b 1 ma0_1f_-_march_2013
 
05a 1 ma0_1h_-_june_2012
05a 1 ma0_1h_-_june_201205a 1 ma0_1h_-_june_2012
05a 1 ma0_1h_-_june_2012
 

Similar to Statistics 1 revision notes

C2 st lecture 13 revision for test b handout
C2 st lecture 13   revision for test b handoutC2 st lecture 13   revision for test b handout
C2 st lecture 13 revision for test b handout
fatima d
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
fatima d
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
simonithomas47935
 
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docxChapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
tiffanyd4
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
University of Salerno
 

Similar to Statistics 1 revision notes (20)

C2 st lecture 13 revision for test b handout
C2 st lecture 13   revision for test b handoutC2 st lecture 13   revision for test b handout
C2 st lecture 13 revision for test b handout
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
 
Normal as Approximation to Binomial
Normal as Approximation to Binomial  Normal as Approximation to Binomial
Normal as Approximation to Binomial
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docxChapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
Chapter 9 Two-Sample Inference 265 Chapter 9 Two-Sam.docx
 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdf
 
2 UNIT-DSP.pptx
2 UNIT-DSP.pptx2 UNIT-DSP.pptx
2 UNIT-DSP.pptx
 
Hypothese concerning proportion by kapil jain MNIT
Hypothese concerning proportion by kapil jain MNITHypothese concerning proportion by kapil jain MNIT
Hypothese concerning proportion by kapil jain MNIT
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 

More from claire meadows-smith

Part 7 final countdown - mark scheme and examiners report
Part 7   final countdown - mark scheme and examiners reportPart 7   final countdown - mark scheme and examiners report
Part 7 final countdown - mark scheme and examiners report
claire meadows-smith
 
Part 6 final countdown - mark scheme and examiners report
Part 6   final countdown - mark scheme and examiners reportPart 6   final countdown - mark scheme and examiners report
Part 6 final countdown - mark scheme and examiners report
claire meadows-smith
 

More from claire meadows-smith (20)

Summer quiz without solutions
Summer quiz without solutionsSummer quiz without solutions
Summer quiz without solutions
 
Higher paper 3 topics
Higher paper 3 topicsHigher paper 3 topics
Higher paper 3 topics
 
Foundation paper 3 topics
Foundation paper 3 topicsFoundation paper 3 topics
Foundation paper 3 topics
 
Persuasive techniques acronym
Persuasive techniques acronymPersuasive techniques acronym
Persuasive techniques acronym
 
Community school language structure
Community school language structureCommunity school language structure
Community school language structure
 
Community school revision
Community school revisionCommunity school revision
Community school revision
 
Revision unit b1 c1p1 ocr gateway
Revision unit b1 c1p1 ocr gatewayRevision unit b1 c1p1 ocr gateway
Revision unit b1 c1p1 ocr gateway
 
May june-grade-booster-postcard
May june-grade-booster-postcardMay june-grade-booster-postcard
May june-grade-booster-postcard
 
Part 7 final countdown - questions
Part 7   final countdown - questions Part 7   final countdown - questions
Part 7 final countdown - questions
 
Part 7 final countdown - mark scheme and examiners report
Part 7   final countdown - mark scheme and examiners reportPart 7   final countdown - mark scheme and examiners report
Part 7 final countdown - mark scheme and examiners report
 
Part 6 final countdown - questions
Part 6   final countdown - questions Part 6   final countdown - questions
Part 6 final countdown - questions
 
Part 6 final countdown - mark scheme and examiners report
Part 6   final countdown - mark scheme and examiners reportPart 6   final countdown - mark scheme and examiners report
Part 6 final countdown - mark scheme and examiners report
 
Part 5 final countdown - questions
Part 5   final countdown - questions Part 5   final countdown - questions
Part 5 final countdown - questions
 
Part 5 final countdown - mark scheme and examiners report
Part 5   final countdown - mark scheme and examiners reportPart 5   final countdown - mark scheme and examiners report
Part 5 final countdown - mark scheme and examiners report
 
Useful websites english
Useful websites englishUseful websites english
Useful websites english
 
Exam techniques english
Exam techniques englishExam techniques english
Exam techniques english
 
Revision tips english
Revision tips englishRevision tips english
Revision tips english
 
Holidays
HolidaysHolidays
Holidays
 
Part 4 final countdown - questions
Part 4   final countdown - questions Part 4   final countdown - questions
Part 4 final countdown - questions
 
Part 4 final countdown - mark scheme and examiners report
Part 4   final countdown - mark scheme and examiners reportPart 4   final countdown - mark scheme and examiners report
Part 4 final countdown - mark scheme and examiners report
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Statistics 1 revision notes

  • 1. AVERAGES AND MEASURES OF SPREAD · Mode : the most common or most popular data value the only average that can be used for qualitative data not suitable if the data values are very varied · Mean : important as it uses all the data values Disadvantage – affected by extreme values · Median : the middle value when the data are in order For n data values the median is the n + 1th value Not affected by extreme values · Range – biggest value – smallest value - greatly affected by extreme values · Interquartile Range – Upper quartile – Lower quartile - measures the spread of the middle 50% of the data and is not affected by extreme values IQR = 9 - 4 = 5 IQR = 7.5 – 3.5 = 4 · Standard Deviation Deviation from the mean is the difference from a value from the mean value The standard deviation is the average of all of these deviations Formulas to work out standard deviation are given in the q SCALING DATA Addition if you add a to each number in the list of data : New mean = old mean + a New median = old median + a New mode = old mode + a Standard Deviation is UNCHANGED Multiplication - If you multiply each number in the list of data by b : New mean = old mean × b New median = old median × b New mode = old mode × b New Standard Deviation = old standard deviation × b If the data is grouped – use the mid-point of each group as your x 2 For 10 values the median will be the 5½ th value – halfway between the 5th and the 6th values 3 4 5 7 8 9 9 LQ M UQ 2 3 4 4 5 7 8 9 LQ M UQ AQA STATISTICS 1 REVISION NOTES www.mathsbox.org.uk
  • 2. PROBABILITY Outcome : each thing that can happen in an experiment Sample Space : list of all the possible values q NOTATION A and B both happen either A or B or both happen C doesn’t happen § P(C’) = 1 – P(C) § Mutually Exclusive Events two or more events that cannot happen at the same time Independent Events the outcome of one event does not affect the outcome of another P( B ) = P(A) P(B) Conditional Probability : when the outcome of the first event affects the outcome of a second event, the probability of the second event depends on what has happened § P(B/A) means the conditional probability of B given A P(B/A) = (A B) so P( B )= P(A) P(B/A) q If the question states that the events are independent then a tree diagram might be a good idea – multiply along the branches then add the appropriate combinations together q Probability of ‘at least 1’ = 1 – Probability of ‘none’ q If you are asked to find probabilities using data in a table – work out the row/column totals before you start A Ç B A È B C' C P(A È B) = P(A) + P(B) – P(A Ç B) P(A È B) = P(A) + P(B) P Ç P(A) A Ç A Ç www.mathsbox.org.uk
  • 3. BINOMIAL DISTRIBUTION q A question is binomial if: · Probability of an event happening is given (p) · Number of people/trials/objects chosen given (n) q EQUALS or EXACTLY use the formula P(x=r) Make sure you write it out with the values substituted in q MORE/LESS THAN/AT LEAST use tables Remember tables give less than or equal to Make sure you list the numbers and identify which ones you need to include P(X>5)  0  1  2  3  4  5  6  7  8  9  = 1 – P(X≤5) P(X<5)  0  1  2  3  4  5  6  7  8 = P(X ≤ 4) q MEAN and VARIANCE Mean = np Variance = np(1-p) standard deviation = q ASSUMPTIONS Independent events with a fixed probability of success Randomly selected q COMPARISONS You may be asked to calculate the mean and standard deviation of a binomial distribution and compare them to the mean and standard deviation of a sample (table of results) - if both the means are approximately the same AND both standard deviations (or variances) are approximately equal then you can say that binomial model appears to fit the data and that it must be independent, random observations. n C r p r (1 – p) n – r from your calculator make sure you write down this value Check that your ‘powers’ add to make n (number of trials) np(1 – p) www.mathsbox.org.uk
  • 4. NORMAL DISTRIBUTION FINDING PROBABILITIES q State the mean and variance (standard deviation) q Standardise to find the z value q Sketch a graph and shade in the area to represent the probability q Use the table to find the probability – - take care with negative z-values – P(z > -0.8) = P(z < 0.8) P(z < -0.8) = 1 – P(z < 0.8) q ALWAYS CHECK YOUR ANSWER WITH YOUR GRAPH – if your shaded area is more that ½ and your answer is 0.4 (for example) – you know you have gone wrong somewhere!! WORKING BACKWARDS – to find the mean/standard deviation – or both (simultaneous equations) q State the probability you know and sketch a graph P(X < 34) = 0.95 q Use tables to find the appropriate z value q Write down the equation used to standardise with all of the known values substituted q Rearrange to find the mean 1.2 z = x – mean standard deviation P(z <1.2) P(z >-0.8) -0.8 -0.8 0.8 -0.8 0.8 0.05 34 Standard deviation = 8 1.6445 = 34 – 8 mean www.mathsbox.org.uk
  • 5. SAMPLES and PROBABILITIES If you are asked to calculate probabilities involving samples remember to divide the (population) standard deviation by the square root of the sample size when you standardise. ESTIMATION – estimating the population mean from a sample mean and finding a confidence interval q CASE 1 : Standard deviation or Variance of the population is stated in the question - from the data you only need to calculate the mean - use your tables to find the appropriate ‘z value’ - write out the above expressions for the confidence intervals with all your values substituted in - calculate the two values for your confidence intervals and state clearly (3 sf) CHECK your answer – add the two answers together and divide by 2 – this should be the sample mean!!! q CASE 2 : Standard deviation or Variance of the population is UNKNOWN - you will need to use the data to calculate the variance of the sample and then an unbiased estimate of the population variance - your calculator will give you value of the standard deviation for the sample you have entered sxn - square this to calculate the sample variance An unbiased estimate of the population variance is Use this as the value of in your confidence interval when substituting the values in q INTERPRETATION – a 95% confidence interval tells us that if we took the same size sample 100 times then 95 of the confidence intervals we would calculate should contain the TRUE population mean. z = x – mean s n or z = x – mean variance n If a random sample of size n is taken from a normal population and the sample mean x is found, then the 95% confidence interval of the population mean is given by where s is either the population variance (if given) or an unbiased estimate of the population variance (found from the sample see below) x – 1.96 ´ s 2 n , x + 1.96 ´ s 2 n 2 Z value n n – 1 ´ Sample variance s 2 www.mathsbox.org.uk
  • 6. CENTRAL LIMIT THEOREM – you only need to use the central limit theorem if you are not told that the sample is selected from a population which is Normally Distributed The theorem concerns the distribution of the sample means and as long as the sample size is large enough (greater than 30) then the sample means will be normally distributed – and so we can calculate confidence intervals REGRESSION – finding the equation of the line of best fit – least squares y = a + bx q If you are asked to interpret the values of a and b, make sure you discuss it in the context of the question NOT in terms of x and y (see examples above) q If you are give a table of values – use your calculator to find a and b In your workings state a = …. b =….. and show your values substituted into y = a + bx q If you are calculating a and b using the formulae – make sure you use the formula book – showing how you substituted the values in Always work out ‘b’ first Use ‘b’ and the means of x and y to work out a a = (mean of y) - b(mean of x) q TO PLOT THE REGRESSION LINE – choose 2 different values of x – use your equation y=a + bx to work out the predicted y-values - plot the two points and join with a straight line q RESIDUAL = OBSERVED(actual value) – PREDICTED(using equation y= a + bx) - the smaller the residuals the greater the accuracy of the line of best fit in predicting values - sometimes an ‘average’ residual can be used to make predictions using the line of best fit –e.g if an individual has an average residual of 5 – then to predict for this particular person using the line, 5 should be added to the value predicted using the equation. q RELIABILITY OF PREDICTIONS - Interpolation predicting using an x-value within the range of x-values used to calculate the a and b considered to be a reliable prediction - Extrapolation predicting using an x-value outside of the range of x-values used to calculate a and b UNRELIABLE estimate - because you are assuming that the linear trend continues indefinitely – use your common sense to explain why this may be incorrect - watch out for NEGATIVE (or unrealistic) y values which may result for the x values suggested – again use your common sense to explain why this is unrealistic Intercept - the value of ‘y’ when ‘x’ is zero. e.g. When the temperature is 0°C ice cream sales are £10 Gradient – the change in y for each unit change in x e.g for every 1 degree rise in temperature sales increase by £b www.mathsbox.org.uk
  • 7. q SCALING either the x or the y values will change the equation e.g y = 0.5 – 1.2x If the x values are doubled then the equation becomes y = 0.5 – 1.2(2x) y = 0.5 – 2.4x If 5 is added on to each of the y values then the equation becomes y + 5 = 0.5 – 1.2x y = - 4.5 – 1.2 x (always rearrange to get y = ‘a’ + ‘b’x) CORRELATION The Product Moment Correlation Coefficient : is a numerical measures of the strength and type of correlation – denoted by r and will lie in the range -1 £ r £ 1 q indicates how well the data, when plotted in a scatter graph, fits a straight line pattern q NOT APPROPRIATE if the data does not follow a linear pattern when plotted (straight line) – so scatter graph is needed to check this q If you are give a table of values – use your calculator to find r (If you have time it’s a good idea to check that you have entered your values correctly) q If you are calculating using summary values -make sure you use the formula book – showing how you substituted the values into the formula INTERPRETING r – make sure you do this in the context of the question (not just positive correlation) e.g. There appears to be a fairly strong relationship between temperature and ice cream sales, higher temperatures appear to correspond to higher values of ice-cream sales and vice versa. q Scaling data – a linear transformation or scaling of one or both of the variables will not affect the correlation coefficient – all of the points will stay in the same position RELATIVE to each other TAKE CARE q Not all correlation will be linear For this data, the correlation coefficient is close to 0 This does not mean that there is no correlation but simply mean that there is no linear correlation (pattern appears to be quadratic) q Spurious Correlation A strong correlation between 2 variables does not mean that one thing causes the other –high marks in a maths exam do not necessarily cause high marks in a Statistics exam, they are likely to both be dependent on a common third variable : the students mathematical ability q Outliers One or two outliers can have a dramatic effect on a correlation coefficient www.mathsbox.org.uk