SlideShare a Scribd company logo
1 of 63
Michael Fletcher
Fall 2014
Econ 453
Section 2
Final Draft
Table of Contents
• Part 1
• Motivation……………………………………………………………………………………….3
• Previous Findings……………………………………………………………………………..4
• Research Questions…………………………………………………………………………..5
• Data…………………………………………………………………………………………………6
• Modeling Dataset………………………………………………………………………………7
• Part 2
• Key Variables…………………………………….……………………………………………..9
• Independent Variables………………………………………………………………………10-11
• Descriptive Statistics…………………………………………………………………………12-21
• Part 3A
• Conditional Statistics………………………………………………………………………..23-33
• Part 3B
• Conditional Statistics……………………………………………………………………….
• Part 4
• …………………………………………………………..
• Part 5A
• ………………………………………………………………..
• Part 5B
Previous Findings
• http://www.bls.gov/web/empsit/cpseea18.pdf
• http://www.bls.gov/cps/lfcharacteristics.htm
Research Questions
• Primary research question
– What are the characteristics of an individual that makes them more/less likely to be willing to endure a
long commute to work?
• Secondary research question
– What are the characteristics of workers that make them more/less likely to be full or part time workers?
• Notes:
– Here we only considered individuals who are in the labor force, so 16-64 year olds.
– To be considered full time the threshold was working over 40hrs per week
Data
Source:
- We gathered data from IPUMS USA
- We took data from the American Community Survey in 2013, conducted by the U.S. Census Bureau.
Size:
- We began with a dataset that contained XXXXXXXXXXXXXX variables and XXXXXXXXX observations
Type:
- This is a cross-sectional dataset
- This data is from the American Community Survey, which is a survey conducted by the U.S. Census Bureau
and given monthly to approximately 250,000 households in the United States monthly. The goal of the
survey is to gather much of the information on the decennial, long-form census survey on a more regular
basis than decennially.
- The census fully implemented this program in 2005.
Description of Modeling Dataset
Modeling the Dataset:
- We downloaded our data from the IPUMS database by choosing variables we were interested in off of the ACS
2013 survey.
- After downloading the data, we chose the independent variables and dependent variables that we would work
with.
- We decided to drop missing observations within out dependent variables because we would have no
information to use for them that has significance towards what we are trying to study.
- We did not have any independent variables with significant missing values.
The Modeled Dataset:
- We created a new dataset from the raw dataset that uses 13 variables and has over 1 million observations
- For our “age” variable we created different levels of age so that we would not have many different categories
for age that consist of a single age.
- For the “language” variable we grouped all the languages where less than 1% of the population spoke at home
into the category of ‘Other”.
- It would be very beneficial to have some sort of data about the type of worker that the person’s parents to
analyze whether lineage plays a role in income and the type of work week one has, but it is a difficult variable
to measure and have great response in.
Part 2
Descriptive Statistics
8
Key Variables
Dependent:
- Full Time: This variable is a measure of whether a person works 40 or more hours in a typical week. If this
variable is a “1” then the person has a full-time schedule or greater, while if the value of this variable is “0”
then the person does not work 40 hours per week.
- Transit time: This variable measures whether a person commutes 30 or more minutes to work one way. If the
variable is represented by a “1” then that person drives at least 30 minutes one way to work, if the value is a
“0” then that person drives maximum 29 minutes to work.
Independent:
- Race
- Sex
- Agegroup
- Marital Status
- Veteran Status
- School Attendance
- Education Level
- Primary language
- Citizenship status
- Class of worker
- Number of Children
Independent Variables
• Agegroup
• The original raw data set provided us with the exact age of each individual.
• In order to clean up the data we grouped certain age ranges to lower the possible values from over 100 to 8.
• Citizen
• This variable defines the citizenship status of the individuals that were surveyed
• Educ
• The Educ variable gives the maximum level of education that an individual reached.
• This ranges from no schooling to five or more years of college education
• Classwkr
• Provides data on whether the individual is self employed or works for a company.
• Language
• Defined as the languages that are dominated by the individual
• Nchild
• Gives insight as to how many children are dependent on the respondent
Independent Variables Cont.
• Marst
• This is the current marital status of the respondent
• School
• This variable defines whether the individual is currently attending school or not
• Vetstat
• The vetstat variable explains whether or not the respondent is a veteran or non veteran
• Race
• This is a general race variable that puts the respondents in broad race categories
• Sex
• This defines whether the respondent is a male or female
Descriptive Statistics: Age Group
• The Agegroup variable is the age range in
which the respondents fall under
• We were originally given the actual age
for every respondent but decided the
information would best be represented in
these seven age brackets
• The values in this variable measure a
quantity as a number
• These quantities can only be whole
numbers and do not include fractions or
decimals.
• All age groups that are in the labor force
are nearly represented equally
• The first age group 16-23 years old is the
least represented with only 1 in 10 people
being in this group.
• The group with the greatest number of
people was the ages 56-64 category,
which accounted for 1 out of every five.
• All the rest of the groups were
responsible for about 1 in 6 people each.
Descriptive Statistics: Citizen
• The citizen variable is the current status of an individuals citizenship status
• The values in this variable measure a quantity as a number(numerical)
• These quantities can only be whole numbers and do not include fractions or decimals.
• The Citizen variable can be broken into four values.
• A little over 8 in 10 respondents skipped or did not feel comfortable answering this question
• Aside from that, 1 in 13 are naturalized citizens while the same ratio applies to non citizens.
• Only 1 in 100 people in this sample were born abroad to American parents.
Descriptive Statistics: Educ
• The Educ variable measures the highest level of education an individual reached.
• The values in this variable measure a quantity as a number
• These quantities can only be whole numbers and do not include fractions or decimals.
• The Educ variable can be broken into six values.
• Out of the respondents, 1 in 3 graduated from high school.
• 1 in 5 managed to complete a four year college career.
Descriptive Statistics: Classwkr
• The Classwkr variable shows if an individual works for a company
or works for themselves
• The values in this variable measure a quantity as a
number(numerical)
• Each different number relates to a different class of worker.
• Nearly 12 out of 13 respondents work for a company
• The remaining amount, 1 in 13, work for themselves
Descriptive Statistics: Home Language
• This is a categorical, character
variable that details what the
primary language spoken at
home is.
• The original language variable
had many of the options for
language end up with less than 1
in 100 people speaking it, so we
took the top 4 languages and
then grouped the rest into a
category named “Other.”
• Over 8 out of 10 people speak
English as their primary
language.
• We can also see that 1 in 10
people speak Spanish as their
primary language.
• The nest most popular
language, Chinese, trails far
behind, with only 1 in 100
people speaking the language
primarily.
Descriptive Statistics: Number of Children in Household
- This is a discrete variable that can take on 10 different values.
- This variable details how many of your own children live in the household.
- Over half of the respondents had no children living with them.
- The popularity of more and more children dies down as you get to higher numbers
and even only 1 in 50 have 4 children or more.
Descriptive Statistics: Marital Status
• This is a discrete variable that details the marital status of a
person.
• About Half of people in the survey are married
• About 3 in 10 people have never been married in this survey.
• About 1 in 10 have been divorced in this sample.
• About 1 in 50 people are married, but separated from their
spouse.
Descriptive Statistics: School Status
• This is a categorical, character variable that asks if a person is
currently enrolled in school. It has two options: In school, and
not in school.
• Most people (about 9 out of 10) are not in school.
Descriptive Statistics: Veteran Status
• This is a categorical, character variable with three choices:
Veteran, Not a veteran, or “Not Applicable” for children and
those unable to serve.
• Because we removed people under 16 in this sample, most of
the people who responded “N/A” would be those unable to
serve.
• We found that an overwhelming portion of the respondents
were not veterans.
• About 1 in 18 people are veterans according to
this sample.
Descriptive Statistics: Sex
• This is a categorical, character variable with two
simple options: Male or Female.
• There were more males who responded in this
survey than females.
• Generally, the mixture between males and females is
about 50/50, which is shown here as well.
Descriptive Statistics: Race
• The race variable defines the major racial category that the
respondent believes they fall under
• The values in this variable measure a quantity as a number
• The Race variable can have up to 9 different values
• A large majority of the respondents fell under the white category,
about 3 in 4
• Black/Negro was the next largest response with about 1 in 10 being in
that category.
• No other category had a response rate greater than 4 in 100
Part 3A
Conditional Statistics
23
Conditional Statistics: Agegroup
• Those who were less likely to be full time workers are those who fell into
the 16-23 age range with 1 in 3 being full time.
• Full time employment sees an increasing trend until middle age and then
decreases as it approaches the retirement age.
• The highest rate of full time employment was set by those who are in the
40-47 age range.
• This trend a long with the low and high rates can be attributed to having to
care for children and other family members and then not having to once
they become dependent.
Conditional Statistics: Citizen
• This variable saw most of its respondent types fall close to the
overall average.
• Those who are naturalized citizens have the highest probability of
being full time with just over 3 out of 4 falling into that category.
• The majority of the population sits slightly above a rate of 7 out of
every 10 laborers being full time.
Conditional Statistics: Educ
• Nationwide around 7 out of every 10 people is considered to be a full time worker.
• Individuals who only reached 10th and 11th grade face the lowest rate with about half of those in that category
being able to find full time employment.
• Completely the 12th grades yields significantly better results for the individual with every 7 out of 10
individuals working 40 hours or more per week.
• Once college begins we see a steady increase with every completed year in college. This leads up to the highest
rate with 4 out of 5 those who complete 5 or more years being employed full time.
Conditional Statistics: Classwkr
• For this class worker variable those who work for wages and are
considered to be full time, as long as the overall average, sits at
about 7 in 10 working people.
• Those who are self-employed have a 2 in 3 chance of working 40
or more hours per week.
• This is contradictory to the belief that self-employed people work
more to keep their business alive.
Conditional Statistics: Home Language
• Those whose primary language at home is either Chinese or Hindi post
the highest ratios when it comes to finding full time employment. Both
sit above a 3 in every 4 person average.
• All other languages were at or above the national average, besides those
who primarily spoke Spanish.
Conditional Statistics: Marital Status
• The first thing that was noticed was that those who have never been
married or are single have a lower probability or being employed full
time. Those individuals are below a 6 in 10 rate which is below the
national average.
• Those with the highest full time employment rates are those who are
currently married, 3 out of 4 are full time employees.
• These can most likely be attributed to having to be responsible for your
significant other versus just oneself.
Conditional Statistics: School Attendance
• People who are not in school are more likely to have full time
employment.
• About 3 in 4 people not in school are working full time jobs.
• This rate drops to about every 4 in 10 people who are still in school
while being in the labor force.
Conditional Statistics: Veteran Status
• Even though both veterans and nonveterans have a high
percentage of full time employment, veterans are more
likely to fall into this category.
• 7 out of 10 non veterans hold full time employment.
• Veterans have a slightly higher rate at every 8 out of 10 hold
full time employment.
Conditional Statistics: Sex
• People who are males are more likely than females to have full time
employment.
• 6 in every 10 females have full time positions.
• 8 in every 10 males hold full time employment.
• This difference can most likely be attributed to cultural norms in the
United States.
Conditional Statistics: Race
• In this variable those who consider themselves to be of some oriental descent(Chinese, Japanese, Other Asian)
are more likely to work over 40 hours per week.
• About 3 out of 4 of people with oriental origins worked full time.
• People who were of two or more races had a lower chance of working full time compared to the overall average.
• Overall about 7 out of 10 individuals reported having worked more than 40 hours per week.
Conditional Statistics: Number of Children
• There was a slight decreasing trend in full time employment as the number of children increased.
• Those who have fewer children are more like to be considered full time.
• Over 3 in 4 of these individuals held positions in which they worked a minimum of 40 hours per week.
Part 3B
Conditional Statistics
35
Conditional Statistics: Age Group
- For most age groups, about 6 out of 10 people have a long transit to work.
- People 16-23 have the lease amount of long transit with only about 4 out of 10
people having a long transit.
- The 16 to 23 age group also seems to be dragging the average down because the
other groups are above average.
Conditional Statistics: Citizen
- Naturalized citizens are most likely to have long transit times with 2/3 of them having a
transit more than 20 minutes long.
- In this sample, many people decided not to say whether they were citizens. This group
has the lowest amount of long transit times.
- 6 out of 10 people born abroad to American parents have a long transit time.
Conditional Statistics: Number of Own Children in Household
- In this sample, most if not all of the respondents hover around the average value.
- Respondents that have more than 9 children have the lowest occurrence of long
transits, with only around half having to travel ore than 20 minutes.
- Surprisingly, those with no children, and those with the most (8 or 9+ children) had the lowest occurrence of
long transit times, with only about half of those with 9 or more having a long commute.
- For everyone else, about 6 in 10 people have a hefty daily commute of more than 20 minutes.
Conditional Statistics: Educational Attainment
- People who only have attained some high school education, specifically grades
10 and 11, have about half travel more than 20 minutes to work.
- People with 4 or more years of college will be slightly more likely to travel for work,
with 6 out of 10 people having to do so in this sample.
- Overall the levels of long transit times do not seem to vary very much with educational attainment in this sample.
Conditional Statistics: Class of Worker
- Self employed people are less likely to have a long commute to work than
people who work for somebody else.
- For self-employed people, about half have to travel far on their daily commute.
- About 6 out of 10 people that work for wages have to travel more than 20 minutes to
their place of employment.
Conditional Statistics: Race
- American Indians and Alaskan Natives are the least likely to have to travel far for work. Less than half of
respondents in this category have a long transit.
- Almost 7 in 10 Chinese people have a long transit to work that lasts more than 20 minutes, according top
this study.
- Other than American Indians/Alaskan Natives, Whites are the group least likely to be faced with a long
commute to work, with only a bit over half of White respondents saying they travel more than 20 minutes daily.
Conditional Statistics: Marital Status
- Married people who live with their spouses are the most likely to have to drive far for their daily commute
according to this survey, with 6 of 10 needing more than 20 minutes to get to their work daily.
- Single people are the least likely to travel far for their work or school, with only about half saying they take
more than 20 minutes on their daily commute.
- The rest of the groups in this sample hover closer around the average of 57%
Conditional Statistics: School Attendance
- People who are in school are less likely to have a long transit time to their work than people
who are not attending school.
- About 6 in ten people who are not in school have a long transit to work.
- Less than half of people in school have to travel far daily for their work.
Conditional Statistics: Veteran Status
- About 6 in 10 veterans have a long transit time to work.
- Chances that you have to travel far for work go down if you are not a veteran.
- Surprisingly, those who skipped this question and did not answer it have a very low occurrence of long transit times, so
there may be something going on with the people who skip that question specifically.
Conditional Statistics:
Home Language
- Those who speak English as their primary language have a lower occurrence of long transit times than those who
do not speak it as their primary language.
- Spanish speakers and other minorities have 6 out of 10 people in this category with long transits to work.
- People who speak Chinese as their primary language have higher rates of long transits to their places of work.
Conditional Statistics: Sex
- More males than females have to travel more than 20
minutes to work.
- About 6 in ten males have transits of 20 minutes or
more.
- Women have a slightly lower chance of long transit
times, but overall
between 5 and 6 women in this sample have to travel
far for their work.
Part 4
Missing Value Imputation and Modeling Variables
47
Missing Variable Imputation and Modeling Variables
Missing Values:
- Because we are looking at statistics on labor and commutes to places, which assumes that perhaps somebody may
be using a car, children under 16 were deleted from the data set in order to run our analyses.
- We also decided to delete observations for people that had zero income, since we are looking at the impact of
certain characteristics on the amount of time worked per week, it makes sense that we only include those who have
income so that we can model for those who do have a job.
- One of the dependent variables we are studying is transit time. We felt it was appropriate to delete people with
missing values for transit time because this would allow us to capture the effects on people who do actually travel
daily.
Imputation:
- No imputation of missing values was done for this data set. If a value was missing, it was either left in and coded
as missing or deleted from the data set.
Dummy Variables
- Many of the variables used in these analyses are categorical and discrete,
and so they require dummy variable coding
in order to run a logistic regression analysis on them.
Dummy variables include:
1. School Age: This dummy variable is based off whether a person is of age to be in school. We picked
the range of ages 16-30 as a reasonable range of ages where a person is more likely to be in school.
2. Married: This dummy variable is based off the Marital Status variable and details whether a person
is married or not in a binary way.
3. No Schooling: This dummy variable is based off of the “Educational Attainment” variable and
details whether or not a person has had no schooling in their life.
4. Some School: This dummy variable is based off of the “Educational Attainment” variable and
details whether a person is a person who has only has some schooling in their life.
5. Some College: This dummy variable is based off of the “Educational Attainment” variable and
details whether a person has attained a High School Diploma and went to college but has not finished.
6. Degree: This dummy variable is based off of the “Educational Attainment” variable and details
whether a person has achieved a college degree as their highest educational attainment.
Dummy Variables (Cont.)
7. Self Employed: : This dummy variable is based off the Class Worker variable and details whether
a person is self-employed or not.
8. Spanish: This dummy variable is based off the Home Language variable and details whether a
person speaks Spanish as their primary language at home or not.
9. In School: : This dummy variable is based off the School variable and details whether a person is
in school or not.
10. Veteran: : This dummy variable is based off the Veteran Status variable and details whether a
person said they were a veteran or not.
11. Male: : This dummy variable is based off the Sex variable and details whether a person is a male
or not.
12. Minority: : This dummy variable is based off the Race variable and details whether a person
belongs to a minority race (non-white) or not.
13. Children: : This dummy variable is based off the Number of Children in Household variable and
details whether a person has any of their children living with them.
Baseline Case
The Baseline Case in our sample is a person who has the following characteristics:
- They are not married.
- They are over 30 years old.
- They are a US Citizen
- They have completed a High School diploma as their highest educational attainment.
- They work for wages rather than being unemployed.
- They do not speak Spanish as their main language
- They are not in school.
- They are not a veteran of the Armed Forces.
- They are Female
- They have no children.
- They are White.
Part 5A
Logistic Regression
52
Regression: Fulltime
53
• We performed a multivariate logistic regression on the Full time
employment dependent variable, as well as the 14 dummy variables we
created.
• Here we are modeling y = 1. This is being done to be able to see how
the independent and dependent variables are related to each other.
Regression: Fulltime
54
Part 5B
Logistic Regression
55
Regression: Long Transit Times
56
Regression: Long Transit Times
57
Part 6
Conclusion
58
Appendix
59
Descriptive Variable Formats
data four;
set IPUMS.usa_00001;
if age < 16 then delete;
if age >= 65 then delete;
if incearn <= 11670 then Poverty= 1;
if incearn >11670 then Poverty= 0;
if incearn = 0000000 then delete;
if TRANTIME= 000 then delete;
if trantime >= 20 then LongTransit= 1;
else longtransit= 0;
if 016 <= age <= 023 then agegroup = 1;
else if 024 <= age <= 031 then agegroup = 2;
else if 032 <= age <= 039 then agegroup = 3;
else if 040 <= age <= 047 then agegroup = 4;
else if 048 <= age <= 055 then agegroup = 5;
else if 056 <= age <= 064 then agegroup = 6;
if language = 01 then homelanguage= 1;
else if language = 12 then homelanguage= 2;
else if language = 43 then homelanguage= 3;
else if language = 31 then homelanguage= 4;
else homelanguage= 5;
if uhrswork >= 40 then Fulltime=1;
if uhrswork <40 then Fulltime= 0;
run; quit;
• Note: In this code, aside
from defining our
descriptive variable
formats, we also deleted
any missing values which
was stated in Part 4
Independent Variable Formats
*Agegroup format;
proc format;
value agegroup_f
1 = "Under 16"
2 = "16 to 23 "
3 = "24 to 31"
4 = "32 to 39"
5 = "40 to 47"
6 = "48 to 55"
7 = "56 to 64"
8 = "65+";
*Citizen format;
proc format;
value CITIZEN_f
0 = "N/A"
1 = "Born abroad of American parents"
2 = "Naturalized citizen"
3 = "Not a citizen"
4 = "Not a citizen, but has received first papers"
5 = "Foreign born, citizenship status not reported";
*Education format;
proc format;
value EDUC_f
00 = "N/A or no schooling"
01 = "Nursery school to grade 4"
02 = "Grade 5, 6, 7, or 8"
03 = "Grade 9"
04 = "Grade 10"
05 = "Grade 11"
06 = "Grade 12"
07 = "1 year of college"
08 = "2 years of college"
09 = "3 years of college"
10 = "4 years of college"
11 = "5+ years of college"
;
*Class worker format;
proc format;
value CLASSWKR_f
0 = "N/A"
1 = "Self-employed"
2 = "Works for wages";
Independent Variable Formats
* Home Language format;
proc format;
value homelanguage_f
1 = "English"
2 = "Spanish"
3 = "Chinese"
4 = "Hindi and related"
5 = "Other";
* Marital Status format;
proc format;
value MARST_f
1 = "Married, spouse present"
2 = "Married, spouse absent"
3 = "Separated"
4 = "Divorced"
5 = "Widowed"
6 = "Never married/single";
*School format;
proc format;
value SCHOOL_f
0 = "N/A"
1 = "No, not in school"
2 = "Yes, in school"
9 = "Missing";
*Veteran Status format;
proc format;
value VETSTAT_f
0 = "N/A"
1 = "Not a veteran"
2 = "Veteran"
9 = "Unknown";
*Sex Format;
proc format;
value SEX_f
1 = "Male"
2 = "Female";
*Race Format;
proc format;
value RACE_f
1 = "White"
2 = "Black/Negro"
3 = "American Indian or Alaska Native"
4 = "Chinese"
5 = "Japanese"
6 = "Other Asian or Pacific Islander"
7 = "Other race, nec"
8 = "Two major races"
9 = "Three or more major races";
Independent Variable Formats
*Number of Children format;
proc format;
value NCHILD_f
0 = "0"
1 = "1"
2 = "2"
3 = "3"
4 = "4"
5 = "5"
6 = "6"
7 = "7"
8 = "8"
9 = "9+"
;
*Self care format;
proc format;
value DIFFCARE_f
0 = "N/A"
1 = "No"
2 = "Yes"
;

More Related Content

Viewers also liked

Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
Damian T. Gordon
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Burak Mızrak
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1
Damian T. Gordon
 
Chapter 1 introduction to statistics for engineers 1 (1)
Chapter 1 introduction to statistics for engineers 1 (1)Chapter 1 introduction to statistics for engineers 1 (1)
Chapter 1 introduction to statistics for engineers 1 (1)
abfisho
 

Viewers also liked (14)

Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
 
Statistics in research
Statistics in research Statistics in research
Statistics in research
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1
 
Statistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics IStatistics for the Health Scientist: Basic Statistics I
Statistics for the Health Scientist: Basic Statistics I
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statistics
 
Lesson 002
Lesson 002Lesson 002
Lesson 002
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
001 Lesson 1 Statistical Techniques for Business & Economics
001 Lesson 1 Statistical Techniques for Business & Economics001 Lesson 1 Statistical Techniques for Business & Economics
001 Lesson 1 Statistical Techniques for Business & Economics
 
Data Analysis: Descriptive Statistics
Data Analysis: Descriptive StatisticsData Analysis: Descriptive Statistics
Data Analysis: Descriptive Statistics
 
Commonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part ICommonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part I
 
Chapter 1 introduction to statistics for engineers 1 (1)
Chapter 1 introduction to statistics for engineers 1 (1)Chapter 1 introduction to statistics for engineers 1 (1)
Chapter 1 introduction to statistics for engineers 1 (1)
 

Similar to SAS log regression

Chapter 2 methods and statistics
Chapter 2  methods and statisticsChapter 2  methods and statistics
Chapter 2 methods and statistics
Psych Soon
 
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
Shea K. Zhao
 
Collecting and reccording data si
Collecting and reccording data siCollecting and reccording data si
Collecting and reccording data si
shehlaijaz
 

Similar to SAS log regression (20)

2 lecture 1 course introduction
2   lecture 1 course introduction2   lecture 1 course introduction
2 lecture 1 course introduction
 
Lecture 1 Course Introduction
Lecture 1 Course IntroductionLecture 1 Course Introduction
Lecture 1 Course Introduction
 
week2 ba lects.pptx
week2 ba lects.pptxweek2 ba lects.pptx
week2 ba lects.pptx
 
Collecting sex-disaggegated data
Collecting sex-disaggegated dataCollecting sex-disaggegated data
Collecting sex-disaggegated data
 
Descriptive inferential-discuss 1
Descriptive  inferential-discuss 1Descriptive  inferential-discuss 1
Descriptive inferential-discuss 1
 
Proposal life satisfaction
Proposal life satisfactionProposal life satisfaction
Proposal life satisfaction
 
Webinar: What Did I Miss? The Hidden Costs of Depriortizing Diversity in User...
Webinar: What Did I Miss? The Hidden Costs of Depriortizing Diversity in User...Webinar: What Did I Miss? The Hidden Costs of Depriortizing Diversity in User...
Webinar: What Did I Miss? The Hidden Costs of Depriortizing Diversity in User...
 
Language Barriers in the United States
Language Barriers in the United StatesLanguage Barriers in the United States
Language Barriers in the United States
 
Chapter 2 methods and statistics
Chapter 2  methods and statisticsChapter 2  methods and statistics
Chapter 2 methods and statistics
 
Twin peaks the future of young people in alberta
Twin peaks   the future of young people in albertaTwin peaks   the future of young people in alberta
Twin peaks the future of young people in alberta
 
Research Methods for MSC MPH.pptx
Research Methods for MSC MPH.pptxResearch Methods for MSC MPH.pptx
Research Methods for MSC MPH.pptx
 
Sampling-Concepts (Statistics and Probability).pptx
Sampling-Concepts (Statistics and Probability).pptxSampling-Concepts (Statistics and Probability).pptx
Sampling-Concepts (Statistics and Probability).pptx
 
OFHS_Tutorial_STATA (1).ppt
OFHS_Tutorial_STATA (1).pptOFHS_Tutorial_STATA (1).ppt
OFHS_Tutorial_STATA (1).ppt
 
OFHS_Tutorial_STATA.ppt
OFHS_Tutorial_STATA.pptOFHS_Tutorial_STATA.ppt
OFHS_Tutorial_STATA.ppt
 
MGT assignment 1.docx
MGT assignment 1.docxMGT assignment 1.docx
MGT assignment 1.docx
 
Claire Foley & Tracy Torchetti - Editing Health Information for a Limited Eng...
Claire Foley & Tracy Torchetti - Editing Health Information for a Limited Eng...Claire Foley & Tracy Torchetti - Editing Health Information for a Limited Eng...
Claire Foley & Tracy Torchetti - Editing Health Information for a Limited Eng...
 
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
Research Proposal: Impact of Parental Absence or Presence on Left-behind Chil...
 
My Own Demography 2 Population Composition.pptx
My Own Demography 2 Population Composition.pptxMy Own Demography 2 Population Composition.pptx
My Own Demography 2 Population Composition.pptx
 
Basic Terms in Statistics
Basic Terms in StatisticsBasic Terms in Statistics
Basic Terms in Statistics
 
Collecting and reccording data si
Collecting and reccording data siCollecting and reccording data si
Collecting and reccording data si
 

SAS log regression

  • 1. Michael Fletcher Fall 2014 Econ 453 Section 2 Final Draft
  • 2. Table of Contents • Part 1 • Motivation……………………………………………………………………………………….3 • Previous Findings……………………………………………………………………………..4 • Research Questions…………………………………………………………………………..5 • Data…………………………………………………………………………………………………6 • Modeling Dataset………………………………………………………………………………7 • Part 2 • Key Variables…………………………………….……………………………………………..9 • Independent Variables………………………………………………………………………10-11 • Descriptive Statistics…………………………………………………………………………12-21 • Part 3A • Conditional Statistics………………………………………………………………………..23-33 • Part 3B • Conditional Statistics………………………………………………………………………. • Part 4 • ………………………………………………………….. • Part 5A • ……………………………………………………………….. • Part 5B
  • 3.
  • 4. Previous Findings • http://www.bls.gov/web/empsit/cpseea18.pdf • http://www.bls.gov/cps/lfcharacteristics.htm
  • 5. Research Questions • Primary research question – What are the characteristics of an individual that makes them more/less likely to be willing to endure a long commute to work? • Secondary research question – What are the characteristics of workers that make them more/less likely to be full or part time workers? • Notes: – Here we only considered individuals who are in the labor force, so 16-64 year olds. – To be considered full time the threshold was working over 40hrs per week
  • 6. Data Source: - We gathered data from IPUMS USA - We took data from the American Community Survey in 2013, conducted by the U.S. Census Bureau. Size: - We began with a dataset that contained XXXXXXXXXXXXXX variables and XXXXXXXXX observations Type: - This is a cross-sectional dataset - This data is from the American Community Survey, which is a survey conducted by the U.S. Census Bureau and given monthly to approximately 250,000 households in the United States monthly. The goal of the survey is to gather much of the information on the decennial, long-form census survey on a more regular basis than decennially. - The census fully implemented this program in 2005.
  • 7. Description of Modeling Dataset Modeling the Dataset: - We downloaded our data from the IPUMS database by choosing variables we were interested in off of the ACS 2013 survey. - After downloading the data, we chose the independent variables and dependent variables that we would work with. - We decided to drop missing observations within out dependent variables because we would have no information to use for them that has significance towards what we are trying to study. - We did not have any independent variables with significant missing values. The Modeled Dataset: - We created a new dataset from the raw dataset that uses 13 variables and has over 1 million observations - For our “age” variable we created different levels of age so that we would not have many different categories for age that consist of a single age. - For the “language” variable we grouped all the languages where less than 1% of the population spoke at home into the category of ‘Other”. - It would be very beneficial to have some sort of data about the type of worker that the person’s parents to analyze whether lineage plays a role in income and the type of work week one has, but it is a difficult variable to measure and have great response in.
  • 9. Key Variables Dependent: - Full Time: This variable is a measure of whether a person works 40 or more hours in a typical week. If this variable is a “1” then the person has a full-time schedule or greater, while if the value of this variable is “0” then the person does not work 40 hours per week. - Transit time: This variable measures whether a person commutes 30 or more minutes to work one way. If the variable is represented by a “1” then that person drives at least 30 minutes one way to work, if the value is a “0” then that person drives maximum 29 minutes to work. Independent: - Race - Sex - Agegroup - Marital Status - Veteran Status - School Attendance - Education Level - Primary language - Citizenship status - Class of worker - Number of Children
  • 10. Independent Variables • Agegroup • The original raw data set provided us with the exact age of each individual. • In order to clean up the data we grouped certain age ranges to lower the possible values from over 100 to 8. • Citizen • This variable defines the citizenship status of the individuals that were surveyed • Educ • The Educ variable gives the maximum level of education that an individual reached. • This ranges from no schooling to five or more years of college education • Classwkr • Provides data on whether the individual is self employed or works for a company. • Language • Defined as the languages that are dominated by the individual • Nchild • Gives insight as to how many children are dependent on the respondent
  • 11. Independent Variables Cont. • Marst • This is the current marital status of the respondent • School • This variable defines whether the individual is currently attending school or not • Vetstat • The vetstat variable explains whether or not the respondent is a veteran or non veteran • Race • This is a general race variable that puts the respondents in broad race categories • Sex • This defines whether the respondent is a male or female
  • 12. Descriptive Statistics: Age Group • The Agegroup variable is the age range in which the respondents fall under • We were originally given the actual age for every respondent but decided the information would best be represented in these seven age brackets • The values in this variable measure a quantity as a number • These quantities can only be whole numbers and do not include fractions or decimals. • All age groups that are in the labor force are nearly represented equally • The first age group 16-23 years old is the least represented with only 1 in 10 people being in this group. • The group with the greatest number of people was the ages 56-64 category, which accounted for 1 out of every five. • All the rest of the groups were responsible for about 1 in 6 people each.
  • 13. Descriptive Statistics: Citizen • The citizen variable is the current status of an individuals citizenship status • The values in this variable measure a quantity as a number(numerical) • These quantities can only be whole numbers and do not include fractions or decimals. • The Citizen variable can be broken into four values. • A little over 8 in 10 respondents skipped or did not feel comfortable answering this question • Aside from that, 1 in 13 are naturalized citizens while the same ratio applies to non citizens. • Only 1 in 100 people in this sample were born abroad to American parents.
  • 14. Descriptive Statistics: Educ • The Educ variable measures the highest level of education an individual reached. • The values in this variable measure a quantity as a number • These quantities can only be whole numbers and do not include fractions or decimals. • The Educ variable can be broken into six values. • Out of the respondents, 1 in 3 graduated from high school. • 1 in 5 managed to complete a four year college career.
  • 15. Descriptive Statistics: Classwkr • The Classwkr variable shows if an individual works for a company or works for themselves • The values in this variable measure a quantity as a number(numerical) • Each different number relates to a different class of worker. • Nearly 12 out of 13 respondents work for a company • The remaining amount, 1 in 13, work for themselves
  • 16. Descriptive Statistics: Home Language • This is a categorical, character variable that details what the primary language spoken at home is. • The original language variable had many of the options for language end up with less than 1 in 100 people speaking it, so we took the top 4 languages and then grouped the rest into a category named “Other.” • Over 8 out of 10 people speak English as their primary language. • We can also see that 1 in 10 people speak Spanish as their primary language. • The nest most popular language, Chinese, trails far behind, with only 1 in 100 people speaking the language primarily.
  • 17. Descriptive Statistics: Number of Children in Household - This is a discrete variable that can take on 10 different values. - This variable details how many of your own children live in the household. - Over half of the respondents had no children living with them. - The popularity of more and more children dies down as you get to higher numbers and even only 1 in 50 have 4 children or more.
  • 18. Descriptive Statistics: Marital Status • This is a discrete variable that details the marital status of a person. • About Half of people in the survey are married • About 3 in 10 people have never been married in this survey. • About 1 in 10 have been divorced in this sample. • About 1 in 50 people are married, but separated from their spouse.
  • 19. Descriptive Statistics: School Status • This is a categorical, character variable that asks if a person is currently enrolled in school. It has two options: In school, and not in school. • Most people (about 9 out of 10) are not in school.
  • 20. Descriptive Statistics: Veteran Status • This is a categorical, character variable with three choices: Veteran, Not a veteran, or “Not Applicable” for children and those unable to serve. • Because we removed people under 16 in this sample, most of the people who responded “N/A” would be those unable to serve. • We found that an overwhelming portion of the respondents were not veterans. • About 1 in 18 people are veterans according to this sample.
  • 21. Descriptive Statistics: Sex • This is a categorical, character variable with two simple options: Male or Female. • There were more males who responded in this survey than females. • Generally, the mixture between males and females is about 50/50, which is shown here as well.
  • 22. Descriptive Statistics: Race • The race variable defines the major racial category that the respondent believes they fall under • The values in this variable measure a quantity as a number • The Race variable can have up to 9 different values • A large majority of the respondents fell under the white category, about 3 in 4 • Black/Negro was the next largest response with about 1 in 10 being in that category. • No other category had a response rate greater than 4 in 100
  • 24. Conditional Statistics: Agegroup • Those who were less likely to be full time workers are those who fell into the 16-23 age range with 1 in 3 being full time. • Full time employment sees an increasing trend until middle age and then decreases as it approaches the retirement age. • The highest rate of full time employment was set by those who are in the 40-47 age range. • This trend a long with the low and high rates can be attributed to having to care for children and other family members and then not having to once they become dependent.
  • 25. Conditional Statistics: Citizen • This variable saw most of its respondent types fall close to the overall average. • Those who are naturalized citizens have the highest probability of being full time with just over 3 out of 4 falling into that category. • The majority of the population sits slightly above a rate of 7 out of every 10 laborers being full time.
  • 26. Conditional Statistics: Educ • Nationwide around 7 out of every 10 people is considered to be a full time worker. • Individuals who only reached 10th and 11th grade face the lowest rate with about half of those in that category being able to find full time employment. • Completely the 12th grades yields significantly better results for the individual with every 7 out of 10 individuals working 40 hours or more per week. • Once college begins we see a steady increase with every completed year in college. This leads up to the highest rate with 4 out of 5 those who complete 5 or more years being employed full time.
  • 27. Conditional Statistics: Classwkr • For this class worker variable those who work for wages and are considered to be full time, as long as the overall average, sits at about 7 in 10 working people. • Those who are self-employed have a 2 in 3 chance of working 40 or more hours per week. • This is contradictory to the belief that self-employed people work more to keep their business alive.
  • 28. Conditional Statistics: Home Language • Those whose primary language at home is either Chinese or Hindi post the highest ratios when it comes to finding full time employment. Both sit above a 3 in every 4 person average. • All other languages were at or above the national average, besides those who primarily spoke Spanish.
  • 29. Conditional Statistics: Marital Status • The first thing that was noticed was that those who have never been married or are single have a lower probability or being employed full time. Those individuals are below a 6 in 10 rate which is below the national average. • Those with the highest full time employment rates are those who are currently married, 3 out of 4 are full time employees. • These can most likely be attributed to having to be responsible for your significant other versus just oneself.
  • 30. Conditional Statistics: School Attendance • People who are not in school are more likely to have full time employment. • About 3 in 4 people not in school are working full time jobs. • This rate drops to about every 4 in 10 people who are still in school while being in the labor force.
  • 31. Conditional Statistics: Veteran Status • Even though both veterans and nonveterans have a high percentage of full time employment, veterans are more likely to fall into this category. • 7 out of 10 non veterans hold full time employment. • Veterans have a slightly higher rate at every 8 out of 10 hold full time employment.
  • 32. Conditional Statistics: Sex • People who are males are more likely than females to have full time employment. • 6 in every 10 females have full time positions. • 8 in every 10 males hold full time employment. • This difference can most likely be attributed to cultural norms in the United States.
  • 33. Conditional Statistics: Race • In this variable those who consider themselves to be of some oriental descent(Chinese, Japanese, Other Asian) are more likely to work over 40 hours per week. • About 3 out of 4 of people with oriental origins worked full time. • People who were of two or more races had a lower chance of working full time compared to the overall average. • Overall about 7 out of 10 individuals reported having worked more than 40 hours per week.
  • 34. Conditional Statistics: Number of Children • There was a slight decreasing trend in full time employment as the number of children increased. • Those who have fewer children are more like to be considered full time. • Over 3 in 4 of these individuals held positions in which they worked a minimum of 40 hours per week.
  • 36. Conditional Statistics: Age Group - For most age groups, about 6 out of 10 people have a long transit to work. - People 16-23 have the lease amount of long transit with only about 4 out of 10 people having a long transit. - The 16 to 23 age group also seems to be dragging the average down because the other groups are above average.
  • 37. Conditional Statistics: Citizen - Naturalized citizens are most likely to have long transit times with 2/3 of them having a transit more than 20 minutes long. - In this sample, many people decided not to say whether they were citizens. This group has the lowest amount of long transit times. - 6 out of 10 people born abroad to American parents have a long transit time.
  • 38. Conditional Statistics: Number of Own Children in Household - In this sample, most if not all of the respondents hover around the average value. - Respondents that have more than 9 children have the lowest occurrence of long transits, with only around half having to travel ore than 20 minutes. - Surprisingly, those with no children, and those with the most (8 or 9+ children) had the lowest occurrence of long transit times, with only about half of those with 9 or more having a long commute. - For everyone else, about 6 in 10 people have a hefty daily commute of more than 20 minutes.
  • 39. Conditional Statistics: Educational Attainment - People who only have attained some high school education, specifically grades 10 and 11, have about half travel more than 20 minutes to work. - People with 4 or more years of college will be slightly more likely to travel for work, with 6 out of 10 people having to do so in this sample. - Overall the levels of long transit times do not seem to vary very much with educational attainment in this sample.
  • 40. Conditional Statistics: Class of Worker - Self employed people are less likely to have a long commute to work than people who work for somebody else. - For self-employed people, about half have to travel far on their daily commute. - About 6 out of 10 people that work for wages have to travel more than 20 minutes to their place of employment.
  • 41. Conditional Statistics: Race - American Indians and Alaskan Natives are the least likely to have to travel far for work. Less than half of respondents in this category have a long transit. - Almost 7 in 10 Chinese people have a long transit to work that lasts more than 20 minutes, according top this study. - Other than American Indians/Alaskan Natives, Whites are the group least likely to be faced with a long commute to work, with only a bit over half of White respondents saying they travel more than 20 minutes daily.
  • 42. Conditional Statistics: Marital Status - Married people who live with their spouses are the most likely to have to drive far for their daily commute according to this survey, with 6 of 10 needing more than 20 minutes to get to their work daily. - Single people are the least likely to travel far for their work or school, with only about half saying they take more than 20 minutes on their daily commute. - The rest of the groups in this sample hover closer around the average of 57%
  • 43. Conditional Statistics: School Attendance - People who are in school are less likely to have a long transit time to their work than people who are not attending school. - About 6 in ten people who are not in school have a long transit to work. - Less than half of people in school have to travel far daily for their work.
  • 44. Conditional Statistics: Veteran Status - About 6 in 10 veterans have a long transit time to work. - Chances that you have to travel far for work go down if you are not a veteran. - Surprisingly, those who skipped this question and did not answer it have a very low occurrence of long transit times, so there may be something going on with the people who skip that question specifically.
  • 45. Conditional Statistics: Home Language - Those who speak English as their primary language have a lower occurrence of long transit times than those who do not speak it as their primary language. - Spanish speakers and other minorities have 6 out of 10 people in this category with long transits to work. - People who speak Chinese as their primary language have higher rates of long transits to their places of work.
  • 46. Conditional Statistics: Sex - More males than females have to travel more than 20 minutes to work. - About 6 in ten males have transits of 20 minutes or more. - Women have a slightly lower chance of long transit times, but overall between 5 and 6 women in this sample have to travel far for their work.
  • 47. Part 4 Missing Value Imputation and Modeling Variables 47
  • 48. Missing Variable Imputation and Modeling Variables Missing Values: - Because we are looking at statistics on labor and commutes to places, which assumes that perhaps somebody may be using a car, children under 16 were deleted from the data set in order to run our analyses. - We also decided to delete observations for people that had zero income, since we are looking at the impact of certain characteristics on the amount of time worked per week, it makes sense that we only include those who have income so that we can model for those who do have a job. - One of the dependent variables we are studying is transit time. We felt it was appropriate to delete people with missing values for transit time because this would allow us to capture the effects on people who do actually travel daily. Imputation: - No imputation of missing values was done for this data set. If a value was missing, it was either left in and coded as missing or deleted from the data set.
  • 49. Dummy Variables - Many of the variables used in these analyses are categorical and discrete, and so they require dummy variable coding in order to run a logistic regression analysis on them. Dummy variables include: 1. School Age: This dummy variable is based off whether a person is of age to be in school. We picked the range of ages 16-30 as a reasonable range of ages where a person is more likely to be in school. 2. Married: This dummy variable is based off the Marital Status variable and details whether a person is married or not in a binary way. 3. No Schooling: This dummy variable is based off of the “Educational Attainment” variable and details whether or not a person has had no schooling in their life. 4. Some School: This dummy variable is based off of the “Educational Attainment” variable and details whether a person is a person who has only has some schooling in their life. 5. Some College: This dummy variable is based off of the “Educational Attainment” variable and details whether a person has attained a High School Diploma and went to college but has not finished. 6. Degree: This dummy variable is based off of the “Educational Attainment” variable and details whether a person has achieved a college degree as their highest educational attainment.
  • 50. Dummy Variables (Cont.) 7. Self Employed: : This dummy variable is based off the Class Worker variable and details whether a person is self-employed or not. 8. Spanish: This dummy variable is based off the Home Language variable and details whether a person speaks Spanish as their primary language at home or not. 9. In School: : This dummy variable is based off the School variable and details whether a person is in school or not. 10. Veteran: : This dummy variable is based off the Veteran Status variable and details whether a person said they were a veteran or not. 11. Male: : This dummy variable is based off the Sex variable and details whether a person is a male or not. 12. Minority: : This dummy variable is based off the Race variable and details whether a person belongs to a minority race (non-white) or not. 13. Children: : This dummy variable is based off the Number of Children in Household variable and details whether a person has any of their children living with them.
  • 51. Baseline Case The Baseline Case in our sample is a person who has the following characteristics: - They are not married. - They are over 30 years old. - They are a US Citizen - They have completed a High School diploma as their highest educational attainment. - They work for wages rather than being unemployed. - They do not speak Spanish as their main language - They are not in school. - They are not a veteran of the Armed Forces. - They are Female - They have no children. - They are White.
  • 53. Regression: Fulltime 53 • We performed a multivariate logistic regression on the Full time employment dependent variable, as well as the 14 dummy variables we created. • Here we are modeling y = 1. This is being done to be able to see how the independent and dependent variables are related to each other.
  • 60. Descriptive Variable Formats data four; set IPUMS.usa_00001; if age < 16 then delete; if age >= 65 then delete; if incearn <= 11670 then Poverty= 1; if incearn >11670 then Poverty= 0; if incearn = 0000000 then delete; if TRANTIME= 000 then delete; if trantime >= 20 then LongTransit= 1; else longtransit= 0; if 016 <= age <= 023 then agegroup = 1; else if 024 <= age <= 031 then agegroup = 2; else if 032 <= age <= 039 then agegroup = 3; else if 040 <= age <= 047 then agegroup = 4; else if 048 <= age <= 055 then agegroup = 5; else if 056 <= age <= 064 then agegroup = 6; if language = 01 then homelanguage= 1; else if language = 12 then homelanguage= 2; else if language = 43 then homelanguage= 3; else if language = 31 then homelanguage= 4; else homelanguage= 5; if uhrswork >= 40 then Fulltime=1; if uhrswork <40 then Fulltime= 0; run; quit; • Note: In this code, aside from defining our descriptive variable formats, we also deleted any missing values which was stated in Part 4
  • 61. Independent Variable Formats *Agegroup format; proc format; value agegroup_f 1 = "Under 16" 2 = "16 to 23 " 3 = "24 to 31" 4 = "32 to 39" 5 = "40 to 47" 6 = "48 to 55" 7 = "56 to 64" 8 = "65+"; *Citizen format; proc format; value CITIZEN_f 0 = "N/A" 1 = "Born abroad of American parents" 2 = "Naturalized citizen" 3 = "Not a citizen" 4 = "Not a citizen, but has received first papers" 5 = "Foreign born, citizenship status not reported"; *Education format; proc format; value EDUC_f 00 = "N/A or no schooling" 01 = "Nursery school to grade 4" 02 = "Grade 5, 6, 7, or 8" 03 = "Grade 9" 04 = "Grade 10" 05 = "Grade 11" 06 = "Grade 12" 07 = "1 year of college" 08 = "2 years of college" 09 = "3 years of college" 10 = "4 years of college" 11 = "5+ years of college" ; *Class worker format; proc format; value CLASSWKR_f 0 = "N/A" 1 = "Self-employed" 2 = "Works for wages";
  • 62. Independent Variable Formats * Home Language format; proc format; value homelanguage_f 1 = "English" 2 = "Spanish" 3 = "Chinese" 4 = "Hindi and related" 5 = "Other"; * Marital Status format; proc format; value MARST_f 1 = "Married, spouse present" 2 = "Married, spouse absent" 3 = "Separated" 4 = "Divorced" 5 = "Widowed" 6 = "Never married/single"; *School format; proc format; value SCHOOL_f 0 = "N/A" 1 = "No, not in school" 2 = "Yes, in school" 9 = "Missing"; *Veteran Status format; proc format; value VETSTAT_f 0 = "N/A" 1 = "Not a veteran" 2 = "Veteran" 9 = "Unknown"; *Sex Format; proc format; value SEX_f 1 = "Male" 2 = "Female"; *Race Format; proc format; value RACE_f 1 = "White" 2 = "Black/Negro" 3 = "American Indian or Alaska Native" 4 = "Chinese" 5 = "Japanese" 6 = "Other Asian or Pacific Islander" 7 = "Other race, nec" 8 = "Two major races" 9 = "Three or more major races";
  • 63. Independent Variable Formats *Number of Children format; proc format; value NCHILD_f 0 = "0" 1 = "1" 2 = "2" 3 = "3" 4 = "4" 5 = "5" 6 = "6" 7 = "7" 8 = "8" 9 = "9+" ; *Self care format; proc format; value DIFFCARE_f 0 = "N/A" 1 = "No" 2 = "Yes" ;