SlideShare a Scribd company logo
1 of 20
Introduction to Statistics
Lecture Notes
Chapters 3-5
Please sign in (SIGNATURES) as you come in to class. It will save
my voice instead of my taking attendance (this is only to settle the
class roster).
What’s up with the powerpoint?
 I don’t usually use slides, but am going to try to use
these to save my voice somewhat.
 Notes: Still working on getting the class roster
settled. Has been some movement on the waitlist,
will keep in touch as things develop. Be sure you’ve
signed in!
 First homework is posted (on our course website),
but isn’t due until next Friday (the 4th). The
additional problem is NOT optional, that just means it
is not a book problem.
Handouts for Today
 There is one handout on graphs/descriptive statistics
going around. Save this to use tomorrow in class.
 There is a second handout – the anonymous survey
largely designed by the class on Monday. Please go
ahead and take a few minutes to fill this out (no
names!) and get it back to me. We’ll take a look at
this data next week in lab.
 If you missed class Monday, I have extra course
syllabuses at the front as well.
The “W”’s of a Data Set
 Who – the observations (population – set of all objects
you are interested in obtaining the value of some
parameter for – since we usually can’t observe all
objects, we take a sample of objects – a subset of the
overall population of objects to observe)
 Note: There is NO such thing as a population sample or
sample population.
 What – the variables
 Why – why was the data collected
 How – how was the data collected (related to
design/sampling in chapters 12-13)
 When/Where – more information that could be relevant
Chapters 3-5 Overview
 Covers basic graphs and descriptive statistics for
both categorical and quantitative variables
 This is what you would do as a “preliminary analysis”
for a variable.
 Recall: a data set can have multiple variables in it.
 These chapters focus on mostly univariate (single
variable) analyses. There is one comparative graph
– a side-by-side boxplot in Chapter 5.
3 Rules of Data Analysis
 Rule 1- Make a picture
 Rule 2 – Make a picture (really, before you do
anything else)
 Rule 3 – Make a picture (really, we mean a well-
chosen picture for your variables)
Categorical Variable Prelim Analysis
 Frequency tables (one variable) – summarize counts
by category
 Contingency tables (2 or more variables) –
summarize counts by category for multiple variables
 Bar charts
 Pie charts
Frequency
 What is frequency?
 Frequency is the number of objects/cases per category
 You can also look at relative frequency.
 Relative frequency is the number of objects/cases per
category divided by the total number of objects.
 Hence it gives proportions for each category out of the
total.
 It is often converted to %.
Bar Charts
 One bar per category – height is determined by
frequency or relative frequency
 Order of categories is arbitrary.
 Does NOT let you talk about the shape of a
distribution.
 “Area” principle – areas are supposed to be relative.
This is often violated when people try to make
graphs “cool” and go 3-D, etc. (see Example passed
around).
Pie Charts
 Take 100% of cases and divide up 360 degrees
based on relative frequencies.
 We will look at bar charts over pie charts.
 Note that for bar charts you do not need to create
bars for 100% of the cases. You could look at the top
three risk factors for a disease, etc. However, we
usually do have 100% of cases shown.
Contingency Tables - Example
 See first page of Handout
 Totals for rows/columns give marginal distributions
for each variable.
 You can also look at conditional distributions. Fix
a row or column and work solely within that row or
column.
 Concept of independence (will formalize later):
 If the distribution of one variable is the same for all
categories of another variable, then the two variables are
independent.
On Your Own
 Text has some discussion of segmented bar-charts
and side-by-side (feel free to read or skip)
Simpson’s Paradox
 Something that can happen when you aggregate
categorical data
 Looking at overall averages or % can be misleading
 Can get different results looking at breakdown
 Berkeley Discrimination Data Example (see bottom of
page one of the handout)
 Claims of Sexual Discrimination in1973 Graduate School
Admissions
 Overall, 44.28% of males who applied were admitted, while
only 34.58% of females were admitted.
 Look what happens when you breakdown by the 6 largest
departments though! (try this on your own or with a partner). Is
there evidence of discrimination against females at the dept.
level? What is going on?
Quantitative Variables Preliminary Analysis
 Graphs
 Dot plot – won’t use much – read about on your own
 Stem and leaf – won’t use much – read about on your own
 Histogram
 Boxplot (chapter 5)
 Qqplot (Friday or next week)
 Time plot (Friday or next week)
 Descriptive statistics
 Measures of center: mean, median
 Measures of spread: standard deviation, IQR, range
Describing the distribution of a quantitative
variable
 You should focus on three things when describing
the distribution of a quantitative variable:
 Shape – unimodal (one peak), bimodal (two peaks),
multimodal (many peaks), bell-shaped, skewed left (tail to
the left), skewed right (tail to the right), symmetric,
uniform (no peaks, basically flat)
 Center – estimate the center (or use a descriptive
statistic)
 If multiple peaks, report the peak locations
 Spread – estimate the spread (can use a descriptive
statistic)
Dot Plot – On Your Own
 Most basic quantitative graph
 Use for a low number of observations (<50)
 Basically use a number line and place a dot above it
for each value you have observed.
 Example from wikipedia:
Stem and Leaf – On Your Own
 Your book discusses lots of options for these,
including split leaves (which is something R/Rcmdr
will do).
 Basics: You take your values and set a stem –
maybe tens. Then the leaves are the ones place. For
each stem, you list the leaves that coincide in
numeric order.
 Usually works decently for fewer than 100
observations
 Try it. Suppose you have scores on a pre-test for an
at-risk youth group as follows:
 5, 11, 13, 21, 34, 36, 45, 47, 48, 48, 49
Histogram
 Take the quantitative variable and break it up into “piles”
or “bins” (usually the same width).
 Count the number of observations in each bin or pile.
 Plot the frequencies per bin.
 Usually no spaces between bins (if there is, it is a gap –
NOT like a bar chart).
 You DO need to know the boundaries. (5,10], (10,15] as
bins IS different from [5,10),[10,15). (If anyone needs me
to explain open/closed brackets, please ask).
 Technology lets us vary the width of bins (effectively the
number)
 You can also use unequal bin widths but then you need
something called density, not frequency.
Examples
 See page 2 of the handout
 Try to describe the shape of each histogram
 Then see page 3 of the handout
 We’re going to create a histogram by hand if there is time
 If no time, you can do this on your own.
Cookie Lab
 Time Permitting (otherwise, Friday)
 The last page (to turn in) is not due till the end of
class tomorrow. So don’t worry if we don’t get to it
today. You can look at it tonight or tomorrow in class
(I’ll give last five minutes of class for you to work on
it).

More Related Content

Similar to Introduction to Statistics - Chapter 3-5 Notes.ppt

Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeIan Cammack
 
Scientific inquiry
Scientific inquiryScientific inquiry
Scientific inquiryjdougherty
 
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxTMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxherthalearmont
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statisticsalbertlaporte
 
3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docxlorainedeserre
 
Data Handling
Data Handling Data Handling
Data Handling 75193
 
Mengxue HuReflection Paper #210202015Topic explain.docx
Mengxue HuReflection Paper #210202015Topic explain.docxMengxue HuReflection Paper #210202015Topic explain.docx
Mengxue HuReflection Paper #210202015Topic explain.docxandreecapon
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project .docx
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project      .docxU3 IP.savMKTG420_U3IP.docUnit 3 Individual Project      .docx
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project .docxwillcoxjanay
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionDrKevinMorrell
 
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workBtm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workcoursesexams1
 
De vry math 221 all discussion+ilbs latest 2016 november
De vry math 221 all discussion+ilbs latest 2016 novemberDe vry math 221 all discussion+ilbs latest 2016 november
De vry math 221 all discussion+ilbs latest 2016 novemberlenasour
 

Similar to Introduction to Statistics - Chapter 3-5 Notes.ppt (20)

Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
Data structure
Data   structureData   structure
Data structure
 
Graphs ppt
Graphs pptGraphs ppt
Graphs ppt
 
Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitative
 
Scientific inquiry
Scientific inquiryScientific inquiry
Scientific inquiry
 
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxTMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
 
Chapter03
Chapter03Chapter03
Chapter03
 
Chapter03
Chapter03Chapter03
Chapter03
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Week 7 spss
Week 7 spssWeek 7 spss
Week 7 spss
 
Experimental Research
Experimental ResearchExperimental Research
Experimental Research
 
3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx3Type your name hereType your three-letter and -number cours.docx
3Type your name hereType your three-letter and -number cours.docx
 
Data Handling
Data Handling Data Handling
Data Handling
 
Mengxue HuReflection Paper #210202015Topic explain.docx
Mengxue HuReflection Paper #210202015Topic explain.docxMengxue HuReflection Paper #210202015Topic explain.docx
Mengxue HuReflection Paper #210202015Topic explain.docx
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project .docx
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project      .docxU3 IP.savMKTG420_U3IP.docUnit 3 Individual Project      .docx
U3 IP.savMKTG420_U3IP.docUnit 3 Individual Project .docx
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
 
Year 9 Stats
Year 9 StatsYear 9 Stats
Year 9 Stats
 
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workBtm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
 
De vry math 221 all discussion+ilbs latest 2016 november
De vry math 221 all discussion+ilbs latest 2016 novemberDe vry math 221 all discussion+ilbs latest 2016 november
De vry math 221 all discussion+ilbs latest 2016 november
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 

Introduction to Statistics - Chapter 3-5 Notes.ppt

  • 1. Introduction to Statistics Lecture Notes Chapters 3-5 Please sign in (SIGNATURES) as you come in to class. It will save my voice instead of my taking attendance (this is only to settle the class roster).
  • 2. What’s up with the powerpoint?  I don’t usually use slides, but am going to try to use these to save my voice somewhat.  Notes: Still working on getting the class roster settled. Has been some movement on the waitlist, will keep in touch as things develop. Be sure you’ve signed in!  First homework is posted (on our course website), but isn’t due until next Friday (the 4th). The additional problem is NOT optional, that just means it is not a book problem.
  • 3. Handouts for Today  There is one handout on graphs/descriptive statistics going around. Save this to use tomorrow in class.  There is a second handout – the anonymous survey largely designed by the class on Monday. Please go ahead and take a few minutes to fill this out (no names!) and get it back to me. We’ll take a look at this data next week in lab.  If you missed class Monday, I have extra course syllabuses at the front as well.
  • 4. The “W”’s of a Data Set  Who – the observations (population – set of all objects you are interested in obtaining the value of some parameter for – since we usually can’t observe all objects, we take a sample of objects – a subset of the overall population of objects to observe)  Note: There is NO such thing as a population sample or sample population.  What – the variables  Why – why was the data collected  How – how was the data collected (related to design/sampling in chapters 12-13)  When/Where – more information that could be relevant
  • 5. Chapters 3-5 Overview  Covers basic graphs and descriptive statistics for both categorical and quantitative variables  This is what you would do as a “preliminary analysis” for a variable.  Recall: a data set can have multiple variables in it.  These chapters focus on mostly univariate (single variable) analyses. There is one comparative graph – a side-by-side boxplot in Chapter 5.
  • 6. 3 Rules of Data Analysis  Rule 1- Make a picture  Rule 2 – Make a picture (really, before you do anything else)  Rule 3 – Make a picture (really, we mean a well- chosen picture for your variables)
  • 7. Categorical Variable Prelim Analysis  Frequency tables (one variable) – summarize counts by category  Contingency tables (2 or more variables) – summarize counts by category for multiple variables  Bar charts  Pie charts
  • 8. Frequency  What is frequency?  Frequency is the number of objects/cases per category  You can also look at relative frequency.  Relative frequency is the number of objects/cases per category divided by the total number of objects.  Hence it gives proportions for each category out of the total.  It is often converted to %.
  • 9. Bar Charts  One bar per category – height is determined by frequency or relative frequency  Order of categories is arbitrary.  Does NOT let you talk about the shape of a distribution.  “Area” principle – areas are supposed to be relative. This is often violated when people try to make graphs “cool” and go 3-D, etc. (see Example passed around).
  • 10. Pie Charts  Take 100% of cases and divide up 360 degrees based on relative frequencies.  We will look at bar charts over pie charts.  Note that for bar charts you do not need to create bars for 100% of the cases. You could look at the top three risk factors for a disease, etc. However, we usually do have 100% of cases shown.
  • 11. Contingency Tables - Example  See first page of Handout  Totals for rows/columns give marginal distributions for each variable.  You can also look at conditional distributions. Fix a row or column and work solely within that row or column.  Concept of independence (will formalize later):  If the distribution of one variable is the same for all categories of another variable, then the two variables are independent.
  • 12. On Your Own  Text has some discussion of segmented bar-charts and side-by-side (feel free to read or skip)
  • 13. Simpson’s Paradox  Something that can happen when you aggregate categorical data  Looking at overall averages or % can be misleading  Can get different results looking at breakdown  Berkeley Discrimination Data Example (see bottom of page one of the handout)  Claims of Sexual Discrimination in1973 Graduate School Admissions  Overall, 44.28% of males who applied were admitted, while only 34.58% of females were admitted.  Look what happens when you breakdown by the 6 largest departments though! (try this on your own or with a partner). Is there evidence of discrimination against females at the dept. level? What is going on?
  • 14. Quantitative Variables Preliminary Analysis  Graphs  Dot plot – won’t use much – read about on your own  Stem and leaf – won’t use much – read about on your own  Histogram  Boxplot (chapter 5)  Qqplot (Friday or next week)  Time plot (Friday or next week)  Descriptive statistics  Measures of center: mean, median  Measures of spread: standard deviation, IQR, range
  • 15. Describing the distribution of a quantitative variable  You should focus on three things when describing the distribution of a quantitative variable:  Shape – unimodal (one peak), bimodal (two peaks), multimodal (many peaks), bell-shaped, skewed left (tail to the left), skewed right (tail to the right), symmetric, uniform (no peaks, basically flat)  Center – estimate the center (or use a descriptive statistic)  If multiple peaks, report the peak locations  Spread – estimate the spread (can use a descriptive statistic)
  • 16. Dot Plot – On Your Own  Most basic quantitative graph  Use for a low number of observations (<50)  Basically use a number line and place a dot above it for each value you have observed.  Example from wikipedia:
  • 17. Stem and Leaf – On Your Own  Your book discusses lots of options for these, including split leaves (which is something R/Rcmdr will do).  Basics: You take your values and set a stem – maybe tens. Then the leaves are the ones place. For each stem, you list the leaves that coincide in numeric order.  Usually works decently for fewer than 100 observations  Try it. Suppose you have scores on a pre-test for an at-risk youth group as follows:  5, 11, 13, 21, 34, 36, 45, 47, 48, 48, 49
  • 18. Histogram  Take the quantitative variable and break it up into “piles” or “bins” (usually the same width).  Count the number of observations in each bin or pile.  Plot the frequencies per bin.  Usually no spaces between bins (if there is, it is a gap – NOT like a bar chart).  You DO need to know the boundaries. (5,10], (10,15] as bins IS different from [5,10),[10,15). (If anyone needs me to explain open/closed brackets, please ask).  Technology lets us vary the width of bins (effectively the number)  You can also use unequal bin widths but then you need something called density, not frequency.
  • 19. Examples  See page 2 of the handout  Try to describe the shape of each histogram  Then see page 3 of the handout  We’re going to create a histogram by hand if there is time  If no time, you can do this on your own.
  • 20. Cookie Lab  Time Permitting (otherwise, Friday)  The last page (to turn in) is not due till the end of class tomorrow. So don’t worry if we don’t get to it today. You can look at it tonight or tomorrow in class (I’ll give last five minutes of class for you to work on it).