5. Research Questions
• Primary research question
– What are the characteristics of an individual that makes them more/less likely to be willing to endure a
long commute to work?
• Secondary research question
– What are the characteristics of workers that make them more/less likely to be full or part time workers?
• Notes:
– Here we only considered individuals who are in the labor force, so 16-64 year olds.
– To be considered full time the threshold was working over 40hrs per week
6. Data
Source:
- We gathered data from IPUMS USA
- We took data from the American Community Survey in 2013, conducted by the U.S. Census Bureau.
Size:
- We began with a dataset that contained XXXXXXXXXXXXXX variables and XXXXXXXXX observations
Type:
- This is a cross-sectional dataset
- This data is from the American Community Survey, which is a survey conducted by the U.S. Census Bureau
and given monthly to approximately 250,000 households in the United States monthly. The goal of the
survey is to gather much of the information on the decennial, long-form census survey on a more regular
basis than decennially.
- The census fully implemented this program in 2005.
7. Description of Modeling Dataset
Modeling the Dataset:
- We downloaded our data from the IPUMS database by choosing variables we were interested in off of the ACS
2013 survey.
- After downloading the data, we chose the independent variables and dependent variables that we would work
with.
- We decided to drop missing observations within out dependent variables because we would have no
information to use for them that has significance towards what we are trying to study.
- We did not have any independent variables with significant missing values.
The Modeled Dataset:
- We created a new dataset from the raw dataset that uses 13 variables and has over 1 million observations
- For our “age” variable we created different levels of age so that we would not have many different categories
for age that consist of a single age.
- For the “language” variable we grouped all the languages where less than 1% of the population spoke at home
into the category of ‘Other”.
- It would be very beneficial to have some sort of data about the type of worker that the person’s parents to
analyze whether lineage plays a role in income and the type of work week one has, but it is a difficult variable
to measure and have great response in.
9. Key Variables
Dependent:
- Full Time: This variable is a measure of whether a person works 40 or more hours in a typical week. If this
variable is a “1” then the person has a full-time schedule or greater, while if the value of this variable is “0”
then the person does not work 40 hours per week.
- Transit time: This variable measures whether a person commutes 30 or more minutes to work one way. If the
variable is represented by a “1” then that person drives at least 30 minutes one way to work, if the value is a
“0” then that person drives maximum 29 minutes to work.
Independent:
- Race
- Sex
- Agegroup
- Marital Status
- Veteran Status
- School Attendance
- Education Level
- Primary language
- Citizenship status
- Class of worker
- Number of Children
10. Independent Variables
• Agegroup
• The original raw data set provided us with the exact age of each individual.
• In order to clean up the data we grouped certain age ranges to lower the possible values from over 100 to 8.
• Citizen
• This variable defines the citizenship status of the individuals that were surveyed
• Educ
• The Educ variable gives the maximum level of education that an individual reached.
• This ranges from no schooling to five or more years of college education
• Classwkr
• Provides data on whether the individual is self employed or works for a company.
• Language
• Defined as the languages that are dominated by the individual
• Nchild
• Gives insight as to how many children are dependent on the respondent
11. Independent Variables Cont.
• Marst
• This is the current marital status of the respondent
• School
• This variable defines whether the individual is currently attending school or not
• Vetstat
• The vetstat variable explains whether or not the respondent is a veteran or non veteran
• Race
• This is a general race variable that puts the respondents in broad race categories
• Sex
• This defines whether the respondent is a male or female
12. Descriptive Statistics: Age Group
• The Agegroup variable is the age range in
which the respondents fall under
• We were originally given the actual age
for every respondent but decided the
information would best be represented in
these seven age brackets
• The values in this variable measure a
quantity as a number
• These quantities can only be whole
numbers and do not include fractions or
decimals.
• All age groups that are in the labor force
are nearly represented equally
• The first age group 16-23 years old is the
least represented with only 1 in 10 people
being in this group.
• The group with the greatest number of
people was the ages 56-64 category,
which accounted for 1 out of every five.
• All the rest of the groups were
responsible for about 1 in 6 people each.
13. Descriptive Statistics: Citizen
• The citizen variable is the current status of an individuals citizenship status
• The values in this variable measure a quantity as a number(numerical)
• These quantities can only be whole numbers and do not include fractions or decimals.
• The Citizen variable can be broken into four values.
• A little over 8 in 10 respondents skipped or did not feel comfortable answering this question
• Aside from that, 1 in 13 are naturalized citizens while the same ratio applies to non citizens.
• Only 1 in 100 people in this sample were born abroad to American parents.
14. Descriptive Statistics: Educ
• The Educ variable measures the highest level of education an individual reached.
• The values in this variable measure a quantity as a number
• These quantities can only be whole numbers and do not include fractions or decimals.
• The Educ variable can be broken into six values.
• Out of the respondents, 1 in 3 graduated from high school.
• 1 in 5 managed to complete a four year college career.
15. Descriptive Statistics: Classwkr
• The Classwkr variable shows if an individual works for a company
or works for themselves
• The values in this variable measure a quantity as a
number(numerical)
• Each different number relates to a different class of worker.
• Nearly 12 out of 13 respondents work for a company
• The remaining amount, 1 in 13, work for themselves
16. Descriptive Statistics: Home Language
• This is a categorical, character
variable that details what the
primary language spoken at
home is.
• The original language variable
had many of the options for
language end up with less than 1
in 100 people speaking it, so we
took the top 4 languages and
then grouped the rest into a
category named “Other.”
• Over 8 out of 10 people speak
English as their primary
language.
• We can also see that 1 in 10
people speak Spanish as their
primary language.
• The nest most popular
language, Chinese, trails far
behind, with only 1 in 100
people speaking the language
primarily.
17. Descriptive Statistics: Number of Children in Household
- This is a discrete variable that can take on 10 different values.
- This variable details how many of your own children live in the household.
- Over half of the respondents had no children living with them.
- The popularity of more and more children dies down as you get to higher numbers
and even only 1 in 50 have 4 children or more.
18. Descriptive Statistics: Marital Status
• This is a discrete variable that details the marital status of a
person.
• About Half of people in the survey are married
• About 3 in 10 people have never been married in this survey.
• About 1 in 10 have been divorced in this sample.
• About 1 in 50 people are married, but separated from their
spouse.
19. Descriptive Statistics: School Status
• This is a categorical, character variable that asks if a person is
currently enrolled in school. It has two options: In school, and
not in school.
• Most people (about 9 out of 10) are not in school.
20. Descriptive Statistics: Veteran Status
• This is a categorical, character variable with three choices:
Veteran, Not a veteran, or “Not Applicable” for children and
those unable to serve.
• Because we removed people under 16 in this sample, most of
the people who responded “N/A” would be those unable to
serve.
• We found that an overwhelming portion of the respondents
were not veterans.
• About 1 in 18 people are veterans according to
this sample.
21. Descriptive Statistics: Sex
• This is a categorical, character variable with two
simple options: Male or Female.
• There were more males who responded in this
survey than females.
• Generally, the mixture between males and females is
about 50/50, which is shown here as well.
22. Descriptive Statistics: Race
• The race variable defines the major racial category that the
respondent believes they fall under
• The values in this variable measure a quantity as a number
• The Race variable can have up to 9 different values
• A large majority of the respondents fell under the white category,
about 3 in 4
• Black/Negro was the next largest response with about 1 in 10 being in
that category.
• No other category had a response rate greater than 4 in 100
24. Conditional Statistics: Agegroup
• Those who were less likely to be full time workers are those who fell into
the 16-23 age range with 1 in 3 being full time.
• Full time employment sees an increasing trend until middle age and then
decreases as it approaches the retirement age.
• The highest rate of full time employment was set by those who are in the
40-47 age range.
• This trend a long with the low and high rates can be attributed to having to
care for children and other family members and then not having to once
they become dependent.
25. Conditional Statistics: Citizen
• This variable saw most of its respondent types fall close to the
overall average.
• Those who are naturalized citizens have the highest probability of
being full time with just over 3 out of 4 falling into that category.
• The majority of the population sits slightly above a rate of 7 out of
every 10 laborers being full time.
26. Conditional Statistics: Educ
• Nationwide around 7 out of every 10 people is considered to be a full time worker.
• Individuals who only reached 10th and 11th grade face the lowest rate with about half of those in that category
being able to find full time employment.
• Completely the 12th grades yields significantly better results for the individual with every 7 out of 10
individuals working 40 hours or more per week.
• Once college begins we see a steady increase with every completed year in college. This leads up to the highest
rate with 4 out of 5 those who complete 5 or more years being employed full time.
27. Conditional Statistics: Classwkr
• For this class worker variable those who work for wages and are
considered to be full time, as long as the overall average, sits at
about 7 in 10 working people.
• Those who are self-employed have a 2 in 3 chance of working 40
or more hours per week.
• This is contradictory to the belief that self-employed people work
more to keep their business alive.
28. Conditional Statistics: Home Language
• Those whose primary language at home is either Chinese or Hindi post
the highest ratios when it comes to finding full time employment. Both
sit above a 3 in every 4 person average.
• All other languages were at or above the national average, besides those
who primarily spoke Spanish.
29. Conditional Statistics: Marital Status
• The first thing that was noticed was that those who have never been
married or are single have a lower probability or being employed full
time. Those individuals are below a 6 in 10 rate which is below the
national average.
• Those with the highest full time employment rates are those who are
currently married, 3 out of 4 are full time employees.
• These can most likely be attributed to having to be responsible for your
significant other versus just oneself.
30. Conditional Statistics: School Attendance
• People who are not in school are more likely to have full time
employment.
• About 3 in 4 people not in school are working full time jobs.
• This rate drops to about every 4 in 10 people who are still in school
while being in the labor force.
31. Conditional Statistics: Veteran Status
• Even though both veterans and nonveterans have a high
percentage of full time employment, veterans are more
likely to fall into this category.
• 7 out of 10 non veterans hold full time employment.
• Veterans have a slightly higher rate at every 8 out of 10 hold
full time employment.
32. Conditional Statistics: Sex
• People who are males are more likely than females to have full time
employment.
• 6 in every 10 females have full time positions.
• 8 in every 10 males hold full time employment.
• This difference can most likely be attributed to cultural norms in the
United States.
33. Conditional Statistics: Race
• In this variable those who consider themselves to be of some oriental descent(Chinese, Japanese, Other Asian)
are more likely to work over 40 hours per week.
• About 3 out of 4 of people with oriental origins worked full time.
• People who were of two or more races had a lower chance of working full time compared to the overall average.
• Overall about 7 out of 10 individuals reported having worked more than 40 hours per week.
34. Conditional Statistics: Number of Children
• There was a slight decreasing trend in full time employment as the number of children increased.
• Those who have fewer children are more like to be considered full time.
• Over 3 in 4 of these individuals held positions in which they worked a minimum of 40 hours per week.
36. Conditional Statistics: Age Group
- For most age groups, about 6 out of 10 people have a long transit to work.
- People 16-23 have the lease amount of long transit with only about 4 out of 10
people having a long transit.
- The 16 to 23 age group also seems to be dragging the average down because the
other groups are above average.
37. Conditional Statistics: Citizen
- Naturalized citizens are most likely to have long transit times with 2/3 of them having a
transit more than 20 minutes long.
- In this sample, many people decided not to say whether they were citizens. This group
has the lowest amount of long transit times.
- 6 out of 10 people born abroad to American parents have a long transit time.
38. Conditional Statistics: Number of Own Children in Household
- In this sample, most if not all of the respondents hover around the average value.
- Respondents that have more than 9 children have the lowest occurrence of long
transits, with only around half having to travel ore than 20 minutes.
- Surprisingly, those with no children, and those with the most (8 or 9+ children) had the lowest occurrence of
long transit times, with only about half of those with 9 or more having a long commute.
- For everyone else, about 6 in 10 people have a hefty daily commute of more than 20 minutes.
39. Conditional Statistics: Educational Attainment
- People who only have attained some high school education, specifically grades
10 and 11, have about half travel more than 20 minutes to work.
- People with 4 or more years of college will be slightly more likely to travel for work,
with 6 out of 10 people having to do so in this sample.
- Overall the levels of long transit times do not seem to vary very much with educational attainment in this sample.
40. Conditional Statistics: Class of Worker
- Self employed people are less likely to have a long commute to work than
people who work for somebody else.
- For self-employed people, about half have to travel far on their daily commute.
- About 6 out of 10 people that work for wages have to travel more than 20 minutes to
their place of employment.
41. Conditional Statistics: Race
- American Indians and Alaskan Natives are the least likely to have to travel far for work. Less than half of
respondents in this category have a long transit.
- Almost 7 in 10 Chinese people have a long transit to work that lasts more than 20 minutes, according top
this study.
- Other than American Indians/Alaskan Natives, Whites are the group least likely to be faced with a long
commute to work, with only a bit over half of White respondents saying they travel more than 20 minutes daily.
42. Conditional Statistics: Marital Status
- Married people who live with their spouses are the most likely to have to drive far for their daily commute
according to this survey, with 6 of 10 needing more than 20 minutes to get to their work daily.
- Single people are the least likely to travel far for their work or school, with only about half saying they take
more than 20 minutes on their daily commute.
- The rest of the groups in this sample hover closer around the average of 57%
43. Conditional Statistics: School Attendance
- People who are in school are less likely to have a long transit time to their work than people
who are not attending school.
- About 6 in ten people who are not in school have a long transit to work.
- Less than half of people in school have to travel far daily for their work.
44. Conditional Statistics: Veteran Status
- About 6 in 10 veterans have a long transit time to work.
- Chances that you have to travel far for work go down if you are not a veteran.
- Surprisingly, those who skipped this question and did not answer it have a very low occurrence of long transit times, so
there may be something going on with the people who skip that question specifically.
45. Conditional Statistics:
Home Language
- Those who speak English as their primary language have a lower occurrence of long transit times than those who
do not speak it as their primary language.
- Spanish speakers and other minorities have 6 out of 10 people in this category with long transits to work.
- People who speak Chinese as their primary language have higher rates of long transits to their places of work.
46. Conditional Statistics: Sex
- More males than females have to travel more than 20
minutes to work.
- About 6 in ten males have transits of 20 minutes or
more.
- Women have a slightly lower chance of long transit
times, but overall
between 5 and 6 women in this sample have to travel
far for their work.
48. Missing Variable Imputation and Modeling Variables
Missing Values:
- Because we are looking at statistics on labor and commutes to places, which assumes that perhaps somebody may
be using a car, children under 16 were deleted from the data set in order to run our analyses.
- We also decided to delete observations for people that had zero income, since we are looking at the impact of
certain characteristics on the amount of time worked per week, it makes sense that we only include those who have
income so that we can model for those who do have a job.
- One of the dependent variables we are studying is transit time. We felt it was appropriate to delete people with
missing values for transit time because this would allow us to capture the effects on people who do actually travel
daily.
Imputation:
- No imputation of missing values was done for this data set. If a value was missing, it was either left in and coded
as missing or deleted from the data set.
49. Dummy Variables
- Many of the variables used in these analyses are categorical and discrete,
and so they require dummy variable coding
in order to run a logistic regression analysis on them.
Dummy variables include:
1. School Age: This dummy variable is based off whether a person is of age to be in school. We picked
the range of ages 16-30 as a reasonable range of ages where a person is more likely to be in school.
2. Married: This dummy variable is based off the Marital Status variable and details whether a person
is married or not in a binary way.
3. No Schooling: This dummy variable is based off of the “Educational Attainment” variable and
details whether or not a person has had no schooling in their life.
4. Some School: This dummy variable is based off of the “Educational Attainment” variable and
details whether a person is a person who has only has some schooling in their life.
5. Some College: This dummy variable is based off of the “Educational Attainment” variable and
details whether a person has attained a High School Diploma and went to college but has not finished.
6. Degree: This dummy variable is based off of the “Educational Attainment” variable and details
whether a person has achieved a college degree as their highest educational attainment.
50. Dummy Variables (Cont.)
7. Self Employed: : This dummy variable is based off the Class Worker variable and details whether
a person is self-employed or not.
8. Spanish: This dummy variable is based off the Home Language variable and details whether a
person speaks Spanish as their primary language at home or not.
9. In School: : This dummy variable is based off the School variable and details whether a person is
in school or not.
10. Veteran: : This dummy variable is based off the Veteran Status variable and details whether a
person said they were a veteran or not.
11. Male: : This dummy variable is based off the Sex variable and details whether a person is a male
or not.
12. Minority: : This dummy variable is based off the Race variable and details whether a person
belongs to a minority race (non-white) or not.
13. Children: : This dummy variable is based off the Number of Children in Household variable and
details whether a person has any of their children living with them.
51. Baseline Case
The Baseline Case in our sample is a person who has the following characteristics:
- They are not married.
- They are over 30 years old.
- They are a US Citizen
- They have completed a High School diploma as their highest educational attainment.
- They work for wages rather than being unemployed.
- They do not speak Spanish as their main language
- They are not in school.
- They are not a veteran of the Armed Forces.
- They are Female
- They have no children.
- They are White.
53. Regression: Fulltime
53
• We performed a multivariate logistic regression on the Full time
employment dependent variable, as well as the 14 dummy variables we
created.
• Here we are modeling y = 1. This is being done to be able to see how
the independent and dependent variables are related to each other.
60. Descriptive Variable Formats
data four;
set IPUMS.usa_00001;
if age < 16 then delete;
if age >= 65 then delete;
if incearn <= 11670 then Poverty= 1;
if incearn >11670 then Poverty= 0;
if incearn = 0000000 then delete;
if TRANTIME= 000 then delete;
if trantime >= 20 then LongTransit= 1;
else longtransit= 0;
if 016 <= age <= 023 then agegroup = 1;
else if 024 <= age <= 031 then agegroup = 2;
else if 032 <= age <= 039 then agegroup = 3;
else if 040 <= age <= 047 then agegroup = 4;
else if 048 <= age <= 055 then agegroup = 5;
else if 056 <= age <= 064 then agegroup = 6;
if language = 01 then homelanguage= 1;
else if language = 12 then homelanguage= 2;
else if language = 43 then homelanguage= 3;
else if language = 31 then homelanguage= 4;
else homelanguage= 5;
if uhrswork >= 40 then Fulltime=1;
if uhrswork <40 then Fulltime= 0;
run; quit;
• Note: In this code, aside
from defining our
descriptive variable
formats, we also deleted
any missing values which
was stated in Part 4
61. Independent Variable Formats
*Agegroup format;
proc format;
value agegroup_f
1 = "Under 16"
2 = "16 to 23 "
3 = "24 to 31"
4 = "32 to 39"
5 = "40 to 47"
6 = "48 to 55"
7 = "56 to 64"
8 = "65+";
*Citizen format;
proc format;
value CITIZEN_f
0 = "N/A"
1 = "Born abroad of American parents"
2 = "Naturalized citizen"
3 = "Not a citizen"
4 = "Not a citizen, but has received first papers"
5 = "Foreign born, citizenship status not reported";
*Education format;
proc format;
value EDUC_f
00 = "N/A or no schooling"
01 = "Nursery school to grade 4"
02 = "Grade 5, 6, 7, or 8"
03 = "Grade 9"
04 = "Grade 10"
05 = "Grade 11"
06 = "Grade 12"
07 = "1 year of college"
08 = "2 years of college"
09 = "3 years of college"
10 = "4 years of college"
11 = "5+ years of college"
;
*Class worker format;
proc format;
value CLASSWKR_f
0 = "N/A"
1 = "Self-employed"
2 = "Works for wages";
62. Independent Variable Formats
* Home Language format;
proc format;
value homelanguage_f
1 = "English"
2 = "Spanish"
3 = "Chinese"
4 = "Hindi and related"
5 = "Other";
* Marital Status format;
proc format;
value MARST_f
1 = "Married, spouse present"
2 = "Married, spouse absent"
3 = "Separated"
4 = "Divorced"
5 = "Widowed"
6 = "Never married/single";
*School format;
proc format;
value SCHOOL_f
0 = "N/A"
1 = "No, not in school"
2 = "Yes, in school"
9 = "Missing";
*Veteran Status format;
proc format;
value VETSTAT_f
0 = "N/A"
1 = "Not a veteran"
2 = "Veteran"
9 = "Unknown";
*Sex Format;
proc format;
value SEX_f
1 = "Male"
2 = "Female";
*Race Format;
proc format;
value RACE_f
1 = "White"
2 = "Black/Negro"
3 = "American Indian or Alaska Native"
4 = "Chinese"
5 = "Japanese"
6 = "Other Asian or Pacific Islander"
7 = "Other race, nec"
8 = "Two major races"
9 = "Three or more major races";