• Save
Statistics
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Statistics

on

  • 256 views

 

Statistics

Views

Total Views
256
Views on SlideShare
256
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Statistics Document Transcript

  • 1. 2013/05/221STATISTICSX-Kit TextbookChapter 9Precalculus TextbookAppendix B: Concepts in StatisticsPar B.2CONTENTTHE GOALLook at ways of summarising a largeamount of sample data in just one or twokey numbers.Two important aspects of a set of data:•The LOCATION•The SPREADMEASURES OF CENTRAL TENDENCY(LOCATION)Arithmetic Mean (Average)Mode (the highest point/frequency)Median (the middle observation)Number of fraudulent cheques received at abank each week for 30 weeksWeek12 3 4 5 6 7 8 9 105 3 8 3 3 1 10 4 6 8Week1112 13 14 15 16 17 18 19 203 5 4 7 6 6 9 3 4 5Week2122 23 24 25 26 27 28 29 307 9 4 5 8 6 4 4 10 4ARITHMETIC MEAN• 𝒙 =𝟏𝟔𝟒𝟑𝟎= 𝟓. 𝟒𝟕• To calculate the MEAN add all the data pointsin our sample and divide by die number ofdata points (sample size).• The MEAN can be a value that doesn’tactually match any observation.• The MEAN gives us useful information aboutthe location of our frequency distribution.
  • 2. 2013/05/222GRAPH0123456781 2 3 4 5 6 7 8 9 10FrequencyFrequencyCALCULATE THE MEANRaw Data• 𝑥 =𝑥𝑛• 𝑥 is datapoints• 𝑛 is numberofobservationsFrequencyTable• 𝑥 =𝑥𝑓𝑛• 𝑥 is datapoints• 𝑛 is numberofobservations• 𝑓 is thefrequencyFrequencyTable (Intervals)• 𝑥 =𝑥𝑓𝑛• 𝑥 is midpointsfor intervals• 𝑛 is numberofobservations• 𝑓 is thefrequencyCALCULATE THE MEAN - FREQUENCY TABLE:NUBEROFFRAUDULENT CHEQUESPERWEEKDistinct Values TallyMarks Frequency1 / 12 03 //// 54 //// // 75 //// 46 //// 47 // 28 /// 39 // 210 // 2Truck Data: weights (in tonnes) of 20 fullyloaded trucksTruck12 3 4 5 6 7 8 9 10Weight4.543.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80Truck1112 13 14 15 16 17 18 19 20Weight2.525.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53CALCULATE THE MEAN - GROUPEDFREQUENCY TABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClass Intervals Frequency Midpoint𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75MODE•The mode is the interval with theHIGHEST FREQUENCY.•There can be two or more modes in a setof data – then the mode would not be agood measure of central tendency.•MULTI-MODAL data consist of more thanone mode.•UNI-MODAL data consist of only onemode.
  • 3. 2013/05/223GRAPH: The MODE = 40123456781 2 3 4 5 6 7 8 9 10FrequencyFrequencyCall Centre Data: waiting times (in seconds)for 35 randomly selected customersC1 2 3 4 5 6 7 8 9 10 11 1275 37 13 90 45 23 104 135 30 73 34 12C13 14 15 16 17 18 19 20 21 22 23 2438 40 22 47 26 57 65 33 9 85 87 16C25 26 27 28 29 30 31 32 33 34 35102 115 68 29 142 5 15 10 25 41 49FREQUENCY TABLE: The MODAL CLASS is theinterval 𝟐𝟓 < 𝒙 ≤ 𝟓𝟎Class Intervals TallyMarks Frequency0 ≤ 𝑥 ≤ 25 //// //// 1025 < 𝑥 ≤ 50 //// //// / 1150 < 𝑥 ≤ 75 //// / 675 < 𝑥 ≤ 100 /// 3100 < 𝑥 ≤ 125 /// 3125 < 𝑥 ≤ 150 // 2HISTOGRAM: MODAL CLASS (𝟐𝟓 < 𝒙 ≤ 𝟓𝟎]024681012Intervals[0;25](25;50](50;75](75;100](100;125](125;150]THE MEDIAN – RAW DATA:Numberoffraudulentchequesreceived atabankeach weekfor30weeksWeek12 3 4 5 6 7 8 9 105 3 8 3 3 1 10 4 6 8Week1112 13 14 15 16 17 18 19 203 5 4 7 6 6 9 3 4 5Week2122 23 24 25 26 27 28 29 307 9 4 5 8 6 4 4 10 4MEDIAN• Median = 5• Put all observations in order from smallest tolargest, then the middle observation is theMEDIAN.1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5,5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10
  • 4. 2013/05/224DON’T FALL INTO THE COMMON TRAP• The median is NOT the middle of the range ofobservations, for example1, 1, 1, 1, 1, 3, 9The median is 1 (the middle observation).The middle of the range (9 – 1) is 5! Bigdifference!MEDIANOdd Number ofObservations,for example 7Median Position𝒏+𝟏𝟐Even Number ofObservations,for example30Median Positionhalf-way between𝒏𝟐𝒂𝒏𝒅 (𝒏𝟐+ 𝟏)FINDTHE MEDIAN -FREQUENCYTABLE:NUBER OF FRAUDULENT CHEQUES PERWEEKDistinct Values Frequency CumulativeFrequency1 1 12 0 13 5 64 7 135 4 176 4 217 2 238 3 269 2 2810 2 30FIND THE MEDIAN - GROUPED FREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClassIntervals Frequency Midpoint𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75FIND THE MEDIAN FROM A GROUPEDFREQUENCY TABLE•Median (middle observation)?•Find the class interval in which thatobservation lies.?CALCULATIONSRaw DataMeanModeMedianFrequency Table(UngroupedData)MeanModeMedianFrequency Table(Grouped Data)MeanModeMedian
  • 5. 2013/05/225HOW TO CHOOSE THE BEST MEASURE OFLOCATION?• When choosing the best measure of location, weneed to look as the SHAPE of the distribution.• For nearly symmetric data, the mean is the bestchoice.• For very skewed (asymmetric) data, the mode ormedian is better.• The mean moves further along the tail than themedian, it is more sensitive to the values far fromthe centre.SYMMETRIC histogram:Mean = Median = ModeA POSITIVELY SKEWED (skewed to the right)histogram has a longer tail on the right side:Mode < Median < MeanA NEGATIVELY SKEWED (skewed to the left)histogram has a longer tail on the left side:Mean < Median < ModePROBLEM•We can find two very different data sets (onedistribution very spread out and another veryconcentrated) with measures of centraltendency EQUAL.•To find a true idea of our sample, we have toMEASURE THE SPREAD OF A DISTRIBUTION,called the spread dispersion.MEASURESOF SPREAD(DISPERSION)Interquartile RangeVarianceStandard Deviation
  • 6. 2013/05/226MEASURINGSPREAD•Think of a distribution in terms ofpercentages, a horizontal axis equally dividedinto 100 percentiles.•The 10th percentile marks the point belowwhich 10% of the observations fall, andabove which 90% of observations fall.•The 50th percentile, below which 50% of theobservations lie, is the median.WORKINGWITH A PERCENTILE• 𝑝% of the observationfall belowthe 𝑝 𝑡ℎ percentile.𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝒑𝟏𝟎𝟎𝒏 + 𝟏• Workingwith the example on fraudulentcheques:1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6,7, 7, 8, 8, 8, 9, 9, 10, 10𝑷 𝟓𝟎 =𝟓𝟎𝟏𝟎𝟎𝟑𝟎 + 𝟏 = 𝟏𝟓. 𝟓• 15.5 tells us where to find our 50th percentile.• 15 tells us which observation to go to, and 0.5 tells us how far tomove along the space between that observation and the nexthighest one.FORMULA• 𝑷 𝟓𝟎 = 𝒙 𝟏𝟓 + 𝟎. 𝟓 𝒙 𝟏𝟔 − 𝒙 𝟏𝟓𝑷 𝒑 = 𝒙 𝒌 + 𝒅 𝒙 𝒌+𝟏 − 𝒙 𝒌• 𝑃 means percentile• 𝑝 tell us which percentile• 𝑘 the whole number calculated from theposition• 𝑑 the decimal fraction calculated from thepositionWORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA:NUBEROFFRAUDULENT CHEQUESPERWEEKDistinct Values Frequency Cumulative Frequency1 1 12 0 0 + 1 = 13 5 1 + 5 = 64 7 6 + 7 = 135 4 13 + 4 = 176 4 17 + 4 = 217 2 21 + 2 = 238 3 23 + 3 = 269 2 26 + 2 = 2810 2 28 + 2 = 30WORKING WITH PERCENTILES (ANDMEDIAN) FROM GROUPED DATA• To identify the class interval 𝑳 < 𝒙 ≤ 𝑼 containing the𝑝 𝑡ℎ percentile:𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝒑𝟏𝟎𝟎𝒏 + 𝟏• The decimal fraction for grouped data is:𝒅 =𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏−𝑺𝒖𝒎 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒕𝒐 𝑳𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑳 < 𝒙 ≤ 𝑼• Calculate the 𝑝 𝑡ℎ percentile:𝑷 𝒑 ≈ 𝑳 + 𝒅 𝑼 − 𝑳FIND THE MEDIAN - GROUPED FREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClass Intervals Frequency CumulativeFrequency𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10𝟒. 𝟎 < 𝐱 ≤ 𝟒. 𝟓 3 13𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20
  • 7. 2013/05/227FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucks• To identify the class interval 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 containingthe 50 𝑡ℎ percentile:𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝟓𝟎𝟏𝟎𝟎𝟐𝟎 + 𝟏 = 𝟏𝟎. 𝟓• The decimal fraction for grouped data is:𝒅 =𝟏𝟎.𝟓 − 𝟏𝟎𝟑=𝟏𝟔• Calculate the 𝑝 𝑡ℎ percentile:𝑷 𝟓𝟎 ≈ 𝟒. 𝟎 + 𝒅 𝟒. 𝟓 − 𝟒. 𝟎 = 𝟒. 𝟎𝟖𝟑𝟑𝟑MEASURINGSPREAD• If we measure the DIFFERENCE in value betweenone percentile and another, this would give us anidea of how widely our data is spread out.• INTERQUARTILE RANGE (IQR) = 75th – 25th Percentiles• The bigger the IQR, the more spread out the data.• The 75th percentile ≥ 25th percentile, therefor theIQR ≥ 0 .• We tend to use the MEDIAN (as measure ofcentral tendency) together with the IQR.FIND THE IQR - GROUPED FREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClassIntervals Frequency CumulativeFrequency𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 13𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucks• To identify the class interval 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 containingthe 75 𝑡ℎ percentile:𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝟕𝟓𝟏𝟎𝟎𝟐𝟎 + 𝟏 = 𝟏𝟓. 𝟕𝟓• The decimal fraction for grouped data is:𝒅 =𝟏𝟓. 𝟕𝟓 − 𝟏𝟑𝟑= 𝟎. 𝟗𝟏𝟕• Calculate the 𝑝 𝑡ℎ percentile:𝑷 𝟕𝟓 ≈ 𝟒. 𝟓 + 𝒅 𝟓. 𝟎 − 𝟒. 𝟓 = 𝟒. 𝟗𝟓𝟖FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:TruckData: weights(intonnes)of20fullyloadedtrucks• To identify the class interval 𝟑. 𝟓 < 𝒙 ≤ 𝟒.0 containingthe 25 𝑡ℎ percentile:𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =𝟐𝟓𝟏𝟎𝟎𝟐𝟎 + 𝟏 = 𝟓. 𝟐𝟓• The decimal fraction for grouped data is:𝒅 =𝟓. 𝟐𝟓 − 𝟓𝟓= 𝟎. 𝟎𝟓• Calculate the 𝑝 𝑡ℎ percentile:𝑷 𝟐𝟓 ≈ 𝟑. 𝟓 + 𝒅 𝟒. 𝟎 − 𝟑. 𝟓 = 𝟑. 𝟓𝟐𝟓• IQR = 4.958 – 3.525 = 1.433MEASURINGSPREAD• When we use the MEAN as our measure of centraltendency, we usually choose A MEASURE OF HOW FARTHE DATA IS SPREAD OUT AROUND THE MEAN.• Two measures of spread that are based on the mean arethe VARIANCE and the STANDARD DEVIATION.• An advantage of standard deviation is that it is measuredin the same units as the original observations.• The variance and standard deviation are closely related.• The variance (𝒔 𝟐 or 𝝈 𝟐) is the square of the standarddeviation (𝒔 or 𝝈).
  • 8. 2013/05/228VARIANCE& STANDARD DEVIATION• Variance is the rough average of all the squareddistances from the mean:𝒔 𝟐 =𝒙 − 𝒙 𝟐𝒏 − 𝟏Or𝒔 𝟐 =𝟏𝒏 − 𝟏𝒙 𝟐 −𝒙 𝟐𝒏• Variance is always a positive number.Number of fraudulent cheques received at abank each week for 30 weeksWeek12 3 4 5 6 7 8 9 105 3 8 3 3 1 10 4 6 8Week1112 13 14 15 16 17 18 19 203 5 4 7 6 6 9 3 4 5Week2122 23 24 25 26 27 28 29 307 9 4 5 8 6 4 4 10 4VARIANCE &STANDARD DEVIATIONFROMRAWDATA 𝒙 = 𝟓. 𝟒𝟕DistinctValues𝒙 − 𝒙 𝒙 − 𝒙 𝟐 Frequencies𝒇 𝒙 − 𝒙 𝟐1 1 − 5.47 = −4.47 −4.47 2= 19.9809 𝟏𝟗. 𝟗𝟖𝟎𝟗2 −3.47 12.0409 𝟎3 −2.47 6.1009 𝟑𝟎. 𝟓𝟎𝟒𝟓4 −1.47 2.1609 𝟏𝟓. 𝟏𝟐𝟔𝟑5 0.47 0.2209 𝟎. 𝟖𝟖𝟑𝟔6 0.53 0.2809 𝟏. 𝟏𝟐𝟑𝟔7 1.53 2.3409 𝟒. 𝟔𝟖𝟏𝟖8 2.53 6.4009 𝟏𝟗. 𝟐𝟎𝟐𝟕9 3.53 12.4609 𝟐𝟒. 𝟗𝟐𝟏𝟖10 4.53 20.5209 𝟒𝟏. 𝟎𝟒𝟏𝟖(𝒙 − 𝒙 ) = 0 𝒙 − 𝒙 𝟐 = 82.509𝟏𝟓𝟕. 𝟒𝟔𝟕CALCULATE THE VARIANCE &STANDARD DEVIATION -FREQUENCY TABLE:NUBEROFFRAUDULENT CHEQUESPERWEEKDistinct Values Frequency Squared Observation1 1 12 0 43 5 94 7 165 4 256 4 367 2 498 3 649 2 8110 2 100VARIANCE & STANDARD DEVIATION FROMUNGROUPED FREQUENCY DATA𝒔 𝟐=𝟏𝒏 − 𝟏𝒇𝒙 𝟐−𝒇𝒙 𝟐𝒏• Variance:𝒔 𝟐=𝟏𝟑𝟎 − 𝟏𝟏𝟎𝟓𝟒 −𝟏𝟔𝟒 𝟐𝟑𝟎= 𝟓. 𝟒𝟐𝟗𝟗• Standard deviation: 𝑠 = 𝜎 = 5.4299 = 𝟐. 𝟑𝟑FIND THE VARIANCE & STANDARDDEVIATION - GROUPED FREQUENCY TABLE:TruckData: weights(intonnes)of20fullyloadedtrucksClass Intervals Frequency Midpoint Squared Midpoint𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 2.75 7.5625𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 10.5625𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 14.0625𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 18.0625𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 22.5625𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 27.5625𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 33.0625
  • 9. 2013/05/229VARIANCE & STANDARD DEVIATION FROMGROUPED DATA𝒔 𝟐=𝟏𝒏 − 𝟏𝒇𝒙 𝟐−𝒇𝒙 𝟐𝒏• Variance:𝒔 𝟐=𝟏𝟐𝟎 − 𝟏𝟑𝟒𝟖. 𝟕𝟓 −𝟖𝟏. 𝟓 𝟐𝟐𝟎= 𝟎. 𝟖𝟕𝟓𝟕• Standard deviation: 𝑠 = 𝜎 = 0.8757 = 𝟎. 𝟗𝟒CALCULATIONSRaw DataIQRVariance &StandardDeviationFrequency Table(UngroupedData)IQRVariance &StandardDeviationFrequency Table(Grouped Data)IQRVariance &StandardDeviationBOX - AND - WISKER DIAGRAM(5 POINT SUMMARY)MinimumValue𝑸 𝟏 = 𝑷 𝟐𝟓Median𝑸 𝟑 = 𝑷 𝟕𝟓MaximumValueEXAMPLEConsider the following set of 23 scores:0 3 4 8 9 12 14 15 16 16 16 1819 21 22 25 32 34 39 43 54 67 771. Find the 5 point summary2. Draw a box – and – wisher plot toillustrate the values5 - POINT SUMMARY0 3 4 8 9 12 14 15 16 16 16 1819 21 22 25 32 34 39 43 54 67 77HOMEWORK•Example X-Kit textbook page 218 – 223.•“Practise for your exams” page 224number 1 & 2.•Par B.2 (page B5) all odd numberquestions.