The document analyzes student debt data from 2004 and 2014. It summarizes key findings from statistical tests run on the data, which aimed to compare average debt levels, percentages of students in debt, and other metrics between the two years. The tests found significant increases in average debt variance and percentages of students in debt from 2004 to 2014. Additionally, a chi-square test revealed a relationship between data robustness and the percentage of students represented. The analysis concludes by discussing plans to fit linear and logistic regression models to the data.
Measurement Memo Re: Measuring the Impact of Student Diversity Programandrejohnson034
This is a Measurement Memo that I developed for graduate course PAD 745 (Program Development and Evaluation). Addressed to the NYC Department of Education, it details baselines and benchmarks to measure my imaginary non-profit, Advocates for Student Diversity in Specialized High Schools (ASDSHS) against.
The organization was seeking funding from the NYC DOE in order to carry out its mission of expanding public and legislative support for the use of a holistic admissions approach in the city's specialized high school admissions process.
ANALYSIS OF RISING TUITION RATES IN THE UNITED STATES BASED ON CLUSTERING ANA...cscpconf
Since higher education is one of the major driving forces for country development and social prosperity, and tuition plays a significant role in determining whether or not a person can
afford to receive higher education, the rising tuition is a topic of big concern today. So it is essentially necessary to understand what factors affect the tuition and how they increase or decrease the tuition. Many existing studies on the rising tuition either lack large amounts of real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition, which fail to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of
large amounts of authentic data and different quantitative methods such as clustering analysis and regression models.
Analysis of Rising Tutition Rates in The United States Based on Clustering An...csandit
Since higher education is one of the major driving
forces for country development and social
prosperity, and tuition plays a significant role in
determining whether or not a person can
afford to receive higher education, the rising tuit
ion is a topic of big concern today. So it is
essentially necessary to understand what factors af
fect the tuition and how they increase or
decrease the tuition. Many existing studies on the
rising tuition either lack large amounts of real
data and proper quantitative models to support thei
r conclusions, or are limited to focus on only
a few factors that might affect the tuition, which
fail to make a comprehensive analysis. In this
paper, we explore a wide variety of factors that mi
ght affect the tuition growth rate by use of
large amounts of authentic data and different quant
itative methods such as clustering analysis
and regression models.
Question 1The Uniform Commercial Code incorporates some of the s.docxmakdul
Question 1
The Uniform Commercial Code incorporates some of the same elements as the Statute of Frauds. Under the Statute of Frauds, certain contracts must be in writing to be enforceable. Research the types of contracts that must be in writing under the Statute of Frauds.
Do you agree with the contracts that need to be in writing and explain why or why not? Imagine that you were asked to be part of a team to draft revisions to the Statute of Frauds. What changes or proposals would you make? Why?
Respond to this… The Statute of Frauds requires that certain types of contracts be in writing to be able to be enforced. These types of contracts include goods that are priced at $500 or more, interest in land, promises to pay off debt, and contracts that cannot be performed within one year, all of which have been signed by the defendant to be enforceable. I do think that all of these contracts should be in writing because it is a type of safeguard of the resource to ensure that each party is responsible for whatever the contract is regarding. For example, if we did not have to sign for a car loan, the responsible party that needs to pay the loan back could walk away, and without a signature of agreement to the terms of the loan, it would be hard for the company to fight for their money, as there is no signature enforcing the agreement.
If I had to revise something with the Statute of Frauds, I would change the contacts that cannot be performed within one year. I think one year is a long time to let a contract slide. I feel that six months sounds more reasonable. I guess if I was a business and I did not get commitment to a contract for a whole year, I feel this would greatly affect my business. I also think it might be a harder fight to get whatever the other party is responsible for as it was a year ago. As a business, I think I would want to pursue a breach of contract in three or four months even. That is a long time to not pay up.
Question 2
Let’s assume that you are interested in doing a statistical survey and you use confidence intervals for your conclusion. Describe a possible scenario and indicate what the population is, and what measure of the population you would try to estimate (proportion or mean) by using a sample.
· What is your estimate of the population size?
· What sample size will you use?
· How will you gather information for your sample?
· What confidence percentage will you use?
Let’s assume that you have completed the survey and now state your results using a confidence interval statement. You can make up the numbers based on a reasonable result.
Respond to this… had found a study in Australia and New Zealand where they wanted to see if there was efficient care when dealing with people that suffered from acute coronary syndrome, that required an understanding of the sources of variation in their care. Basically, they wanted to see if the people that did not speak English well were receiving the same amount of care a ...
4.2 Identify the parameter, Part II. For each of the following sit.docxgilbertkpeters11344
4.2 Identify the parameter, Part II. For each of the following situations, state whether the
parameter of interest is a mean or a proportion.
(a) A poll shows that 64% of Americans personally worry a great deal about federal spending and the budget deficit.
(b) A survey reports that local TV news has shown a 17% increase in revenue between 2009 and 2011 while newspaper revenues decreased by 6.4% during this time period.
(c) In a survey, high school and college students are asked whether or not they use geolocation
services on their smart phones.
(d) In a survey, internet users are asked whether or not they purchased any Groupon coupons.
(e) In a survey, internet users are asked how many Groupon coupons they purchased over the last year.
4.4 Heights of adults. Researchers studying anthropometry collected body girth measurements
and skeletal diameter measurements, as well as age, weight, height and gender, for 507 physically
active individuals. The histogram below shows the sample distribution of heights in centimeters.
(a) What is the point estimate for the average height of active individuals? What about the
median?
(b) What is the point estimate for the standard deviation of the heights of active individuals?
What about the IQR?
(c) Is a person who is 1m 80cm (180 cm) tall considered unusually tall? And is a person who is
1m 55cm (155cm) considered unusually short? Explain your reasoning.
(d) The researchers take another random sample of physically active individuals. Would you
expect the mean and the standard deviation of this new sample to be the ones given above.
Explain your reasoning.
(e) The samples means obtained are point estimates for the mean height of all active individuals,
if the sample of individuals is equivalent to a simple random sample. What measure do we use
to quantify the variability of such an estimate? Compute this quantity using the data from
the original sample under the condition that the data are a simple random sample.
4.6 Chocolate chip cookies. Students are asked to count the number of chocolate chips in 22
cookies for a class activity. They found that the cookies on average had 14.77 chocolate chips with
a standard deviation of 4.37 chocolate chips.
(a) Based on this information, about how much variability should they expect to see in the mean
number of chocolate chips in random samples of 22 chocolate chip cookies?
(b) The packaging for these cookies claims that there are at least 20 chocolate chips per cookie.
One student thinks this number is unreasonably high since the average they found is much
lower. Another student claims the di_erence might be due to chance. What do you think?
4.8 Mental health. Another question on the General Social Survey introduced in Exercise 4.7
is \For how many days during the past 30 days was your mental health, which includes stress,
depression, and problems with emotions, not good?" Based on responses from 1,151 US residents,
the survey reporte.
Post the stakeholder role you are assuming. Then, post an explanat.docxshpopkinkz
Post
the stakeholder role you are assuming. Then, post an explanation of how you, in the particular role you are assuming, might respond to the new information in the articles you found and in Document Set 2 for your case study. In your explanation, be sure to:
Evaluate whether the new information is based on reliable sources and whether the information is relevant to the issue.
Explain your position on the case study issue from the perspective of the role you are assuming and how this new information informs this position.
Explain the steps you might take to follow-up on this information based on your role and your position on the issue.
Examples of stakeholder's roles that you could assume:
-Educator
-Parent
-State Department of Education
-Student Attending College
Throughout the Discussion, add support for your position or add to the knowledge base on the issue by finding and sharing additional resources related to the issue you are discussing. These should include scholarly resources but may include other resources such as news articles, blogs, RSS feeds, etc. Share links to the resources you identify.
Once you have decided which stakeholders role you will be assuming, respond to the below discussion questions:
Discussion #1
The stakeholder role I am assuming is the business leader. I am in support of increasing curricular focus, funding, and new hiring for professional and technical fields. In Document Set2 for this case study, it states there are 3.3 million job openings in the U.S., many going unfilled for months on end, as roughly half of employers now say they’re having a hard time finding qualified workers to hire, especially in technical fields. This information was retrieved from
White House Jobs Council
which is based on a reliable source. This information is extremely relevant to the issue because the solution to producing qualified workers is to equip students with the necessary skills and abilities. Those skills and abilities should align with the expectations established by industry leaders.
According to the White House Jobs Council in 2012
,
America is losing its position of global educational leadership in ways that could put our future living standards and business
competiveness
at risk.
This new information informs this position because without a change in focus to technical education those unfilled jobs will continue to increase.
In the PBS
Newshour
video, it gives examples of students who graduated with liberal arts degrees.
All of them had a difficult time gaining employment directly related to their field of study after graduation. In one instance, there was a graduate who majored in anthropology and he now washes trash cans part-time. In another instance, there was a graduate who majored in history with a minor in political science who is a substitute teacher one day a week. Both of the graduates agree, that they do not regret going to college although they wished they would have pursued som.
The aim of this study is to determining the factors which could affect the credit scoring to reveal the relationship between economical policies implemented in Turkey and the credit ratings given by credit scoring agencies with econometrics method along with comparisons among countries. When the countries own resources are not enaugh to finance economical growth, countries are needed for foreign investments.These foreign investments are wanted by countries as direct foreign investments or financial investments. Both kinds want to have a trust on types of economies to invest on them. For this reason it is needed to have a indicator for safety of a country to invest .The most important indicator developed for this purpose is credit rate. Thus, figures of GDP, Current Account Balance, Foreign Borrowing and Inflation of Turkey in the year of the 2000-2015 using parametric and semiparametric logit models. The semiparametric methods best fitting models using best fitting smoothing methods when the combines that best features of the parametric and nonparametric approaches when the parametric model violated. We used the data of IMF World Economic Outlook Database and IMF Article IV countries reports, Moody’s,Standart&Poors and Fitch main reports on site.
On May 9, Civic Enterprises and the Everyone Graduates Center at Johns Hopkins University, as part of the GradNation Campaign, released the 2016 Building a Grad Nation report. Released annually, the report shows detailed progress toward the GradNation goal of a national on-time graduation rate of 90 percent by 2020.
That afternoon, expert speakers and co-authors of the report – John Bridgeland, CEO and president, Civic Enterprises,Jennifer DePaoli, senior education advisor, Civic Enterprises, and Robert Balfanz, director of the Everyone Graduates Center at Johns Hopkins University School of Education – discussed where the nation and states stand on the path to 90 percent.
The webinar was moderated by Tanya Tucker, vice president of alliance engagement, America's Promise Alliance.
In addition to audience questions, topics included:
• Where the nation and states stand on reaching the 90 percent by 2020 goal
• Threats to achieving the goal
• Setting the record straight on graduation rates
• Recommendations for moving forward
Find the report at: www.gradnation.org/2016report
1. Aside from indicating whether or not we shout for joy if we get.docxmonicafrancis71118
1. Aside from indicating whether or not we shout for joy if we get a p value above or below .05, explain what we mean when we state, the relationship between the Dependent and the Independent Variable is statistical significant at the .05 level.
2. After running several different statistical tests, a researcher may get several estimations regarding the value of the relationship between a dependent and independent variable. Explain what is meant by strength of association and how that concept helps a researcher convey the usefulness of an association between variables?
3. Researchers look for narrow confidence intervals. Explain why a narrow point estimate is useful to the researcher in their ability to evaluate the quality of their research as well as convey to policy makers the accuracy of their results.
4. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4).pdf, provide a clear explanation for what the R-Square value of .435 means in this case. Remember a full explanation defines r-square, interprets the value, and contextualizes the meaning of this value for future readers of your results.
5. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-1.pdf provide a clear explanation for the meaning of the Global F test for significance which in this case is 0.000. Remember a full explanation defines statistical significance in this case, interprets the value, and contextualizes the meaning of this value for future readers of your results.
6. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-2.pdf provide a clear explanation for the meaning of the unstandardized b coefficient for 'Receives job training" (-0.010) and its significance value of 0.010. Remember a full explanation defines the concept of unstandardized b coefficient, interprets, statistical significance, interprets the value of the unstandardized b coefficient, and contextualizes the meaning of this value for future readers of your results.
7. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-3.pdf Provide a clear explanation for the meaning of the unstandardized b coefficient for 'Number of Dependents" (0.000) and its significance value of 0.802. Remember a full explanation defines the concept of unstandardized b coefficient, interprets, statistical significance, interprets the value of the unstandardized b coefficient, and contextualizes the meaning of this value for future readers of your results.
8. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-4.pdf provide a clear explanation for the meaning of the unstandardized b for"Medical Condition" (0.013) and its significance value of 0..
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELSIJDKP
Tuition plays a significant role in determining whether a student could afford higher education, which is
one of the major driving forces for country development and social prosperity. So it is necessary to fully
understand what factors might affect the tuition and how they affect it. However, many existing studies on
the tuition growth rate either lack sufficient real data and proper quantitative models to support their
conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to
make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the
tuition growth rate by use of large amounts of authentic data and different quantitative methods such as
clustering and regression models.
2013 YRBS Data User’s GuideYouth Risk Behavior Surv.docxvickeryr87
2013 YRBS Data
User’s Guide
Youth Risk Behavior Surveillance System (YRBSS)
June 2014
Where can I get more information? Visit www.cdc.gov/yrbss or call 800−CDC−INFO (800−232−4636).
www.cdc.gov/yrbss
2 0 1 3 Y R B S D a t a U s e r ’ s G u i d e
Introduction to the YRBSS
Introduction The YRBSS was developed in 1990 to monitor priority health risk behaviors
that contribute markedly to the leading causes of death, disability, and social
problems among youth and adults in the United States. These behaviors, often
established during childhood and early adolescence, include
Behaviors that contribute to unintentional injuries and violence.
Sexual behaviors that contribute to unintended pregnancy and sexually
transmitted infections, including HIV infection.
Alcohol and other drug use.
Tobacco use.
Unhealthy dietary behaviors.
Inadequate physical activity.
In addition, the YRBSS monitors the prevalence of obesity and asthma.
From 1991 through 2013, the YRBSS has collected data from more than 2.6
million high school students in more than 1,100 separate surveys.
Uses of YRBSS The YRBSS was designed to
Results
Determine the prevalence of health risk behaviors.
Assess whether health risk behaviors increase, decrease, or stay the
same over time.
Examine the co-occurrence of health risk behaviors.
Provide comparable national, state, territorial, tribal, and local data.
Provide comparable data among subpopulations of youth.
Monitor progress toward achieving the Healthy People objectives and
other program indicators.
Components of
the YRBSS
The YRBSS includes national, state, territorial, tribal government, and local
school-based surveys of representative samples of 9th through 12th grade
students. These surveys are conducted every two years, usually during the
spring semester. The national survey, conducted by CDC, provides data
representative of 9th through 12th grade students in public and private
schools in the United States. The state, territorial, tribal government, and local
surveys, conducted by departments of health and education, provide data
representative of mostly public high school students in each jurisdiction.
The YRBSS also includes additional surveys conducted by CDC:
A middle school survey conducted by interested states, territories,
tribal governments, and large urban school districts.
June 2014 http://www.cdc.gov/yrbss Page 1
2 0 1 3 Y R B S D a t a U s e r ’ s G u i d e
A 2010 study to measure physical activity and nutrition-related
behaviors and determinants of these behaviors among a nationally
representative sample of high school students.
A series of methods studies conducted in 1992, 2000, 2002, 2004, and
2008 to improve the quality and interpretation of the YRBSS data.
The National Alternative High School Youth .
Assessing the costs of public higher education in the commonwealth of virgini...Robert M. Davis, MPA
Part 4 in a series of whitepaper research examining the costs of public higher education in the Commonwealth of Virginia. Loan borrowing has become the means in which to cope which costs increases. Loan borrowing may be one of the primary options available to finance the costs of higher education, there are risks associated with this option; recent research identifies that those risks may be growing.
Ways and Means Committee review of previous budgets and metrics evaluation. Some comparisons are made with neighboring towns. Submitted to the administration and the school board.
Measurement Memo Re: Measuring the Impact of Student Diversity Programandrejohnson034
This is a Measurement Memo that I developed for graduate course PAD 745 (Program Development and Evaluation). Addressed to the NYC Department of Education, it details baselines and benchmarks to measure my imaginary non-profit, Advocates for Student Diversity in Specialized High Schools (ASDSHS) against.
The organization was seeking funding from the NYC DOE in order to carry out its mission of expanding public and legislative support for the use of a holistic admissions approach in the city's specialized high school admissions process.
ANALYSIS OF RISING TUITION RATES IN THE UNITED STATES BASED ON CLUSTERING ANA...cscpconf
Since higher education is one of the major driving forces for country development and social prosperity, and tuition plays a significant role in determining whether or not a person can
afford to receive higher education, the rising tuition is a topic of big concern today. So it is essentially necessary to understand what factors affect the tuition and how they increase or decrease the tuition. Many existing studies on the rising tuition either lack large amounts of real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition, which fail to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of
large amounts of authentic data and different quantitative methods such as clustering analysis and regression models.
Analysis of Rising Tutition Rates in The United States Based on Clustering An...csandit
Since higher education is one of the major driving
forces for country development and social
prosperity, and tuition plays a significant role in
determining whether or not a person can
afford to receive higher education, the rising tuit
ion is a topic of big concern today. So it is
essentially necessary to understand what factors af
fect the tuition and how they increase or
decrease the tuition. Many existing studies on the
rising tuition either lack large amounts of real
data and proper quantitative models to support thei
r conclusions, or are limited to focus on only
a few factors that might affect the tuition, which
fail to make a comprehensive analysis. In this
paper, we explore a wide variety of factors that mi
ght affect the tuition growth rate by use of
large amounts of authentic data and different quant
itative methods such as clustering analysis
and regression models.
Question 1The Uniform Commercial Code incorporates some of the s.docxmakdul
Question 1
The Uniform Commercial Code incorporates some of the same elements as the Statute of Frauds. Under the Statute of Frauds, certain contracts must be in writing to be enforceable. Research the types of contracts that must be in writing under the Statute of Frauds.
Do you agree with the contracts that need to be in writing and explain why or why not? Imagine that you were asked to be part of a team to draft revisions to the Statute of Frauds. What changes or proposals would you make? Why?
Respond to this… The Statute of Frauds requires that certain types of contracts be in writing to be able to be enforced. These types of contracts include goods that are priced at $500 or more, interest in land, promises to pay off debt, and contracts that cannot be performed within one year, all of which have been signed by the defendant to be enforceable. I do think that all of these contracts should be in writing because it is a type of safeguard of the resource to ensure that each party is responsible for whatever the contract is regarding. For example, if we did not have to sign for a car loan, the responsible party that needs to pay the loan back could walk away, and without a signature of agreement to the terms of the loan, it would be hard for the company to fight for their money, as there is no signature enforcing the agreement.
If I had to revise something with the Statute of Frauds, I would change the contacts that cannot be performed within one year. I think one year is a long time to let a contract slide. I feel that six months sounds more reasonable. I guess if I was a business and I did not get commitment to a contract for a whole year, I feel this would greatly affect my business. I also think it might be a harder fight to get whatever the other party is responsible for as it was a year ago. As a business, I think I would want to pursue a breach of contract in three or four months even. That is a long time to not pay up.
Question 2
Let’s assume that you are interested in doing a statistical survey and you use confidence intervals for your conclusion. Describe a possible scenario and indicate what the population is, and what measure of the population you would try to estimate (proportion or mean) by using a sample.
· What is your estimate of the population size?
· What sample size will you use?
· How will you gather information for your sample?
· What confidence percentage will you use?
Let’s assume that you have completed the survey and now state your results using a confidence interval statement. You can make up the numbers based on a reasonable result.
Respond to this… had found a study in Australia and New Zealand where they wanted to see if there was efficient care when dealing with people that suffered from acute coronary syndrome, that required an understanding of the sources of variation in their care. Basically, they wanted to see if the people that did not speak English well were receiving the same amount of care a ...
4.2 Identify the parameter, Part II. For each of the following sit.docxgilbertkpeters11344
4.2 Identify the parameter, Part II. For each of the following situations, state whether the
parameter of interest is a mean or a proportion.
(a) A poll shows that 64% of Americans personally worry a great deal about federal spending and the budget deficit.
(b) A survey reports that local TV news has shown a 17% increase in revenue between 2009 and 2011 while newspaper revenues decreased by 6.4% during this time period.
(c) In a survey, high school and college students are asked whether or not they use geolocation
services on their smart phones.
(d) In a survey, internet users are asked whether or not they purchased any Groupon coupons.
(e) In a survey, internet users are asked how many Groupon coupons they purchased over the last year.
4.4 Heights of adults. Researchers studying anthropometry collected body girth measurements
and skeletal diameter measurements, as well as age, weight, height and gender, for 507 physically
active individuals. The histogram below shows the sample distribution of heights in centimeters.
(a) What is the point estimate for the average height of active individuals? What about the
median?
(b) What is the point estimate for the standard deviation of the heights of active individuals?
What about the IQR?
(c) Is a person who is 1m 80cm (180 cm) tall considered unusually tall? And is a person who is
1m 55cm (155cm) considered unusually short? Explain your reasoning.
(d) The researchers take another random sample of physically active individuals. Would you
expect the mean and the standard deviation of this new sample to be the ones given above.
Explain your reasoning.
(e) The samples means obtained are point estimates for the mean height of all active individuals,
if the sample of individuals is equivalent to a simple random sample. What measure do we use
to quantify the variability of such an estimate? Compute this quantity using the data from
the original sample under the condition that the data are a simple random sample.
4.6 Chocolate chip cookies. Students are asked to count the number of chocolate chips in 22
cookies for a class activity. They found that the cookies on average had 14.77 chocolate chips with
a standard deviation of 4.37 chocolate chips.
(a) Based on this information, about how much variability should they expect to see in the mean
number of chocolate chips in random samples of 22 chocolate chip cookies?
(b) The packaging for these cookies claims that there are at least 20 chocolate chips per cookie.
One student thinks this number is unreasonably high since the average they found is much
lower. Another student claims the di_erence might be due to chance. What do you think?
4.8 Mental health. Another question on the General Social Survey introduced in Exercise 4.7
is \For how many days during the past 30 days was your mental health, which includes stress,
depression, and problems with emotions, not good?" Based on responses from 1,151 US residents,
the survey reporte.
Post the stakeholder role you are assuming. Then, post an explanat.docxshpopkinkz
Post
the stakeholder role you are assuming. Then, post an explanation of how you, in the particular role you are assuming, might respond to the new information in the articles you found and in Document Set 2 for your case study. In your explanation, be sure to:
Evaluate whether the new information is based on reliable sources and whether the information is relevant to the issue.
Explain your position on the case study issue from the perspective of the role you are assuming and how this new information informs this position.
Explain the steps you might take to follow-up on this information based on your role and your position on the issue.
Examples of stakeholder's roles that you could assume:
-Educator
-Parent
-State Department of Education
-Student Attending College
Throughout the Discussion, add support for your position or add to the knowledge base on the issue by finding and sharing additional resources related to the issue you are discussing. These should include scholarly resources but may include other resources such as news articles, blogs, RSS feeds, etc. Share links to the resources you identify.
Once you have decided which stakeholders role you will be assuming, respond to the below discussion questions:
Discussion #1
The stakeholder role I am assuming is the business leader. I am in support of increasing curricular focus, funding, and new hiring for professional and technical fields. In Document Set2 for this case study, it states there are 3.3 million job openings in the U.S., many going unfilled for months on end, as roughly half of employers now say they’re having a hard time finding qualified workers to hire, especially in technical fields. This information was retrieved from
White House Jobs Council
which is based on a reliable source. This information is extremely relevant to the issue because the solution to producing qualified workers is to equip students with the necessary skills and abilities. Those skills and abilities should align with the expectations established by industry leaders.
According to the White House Jobs Council in 2012
,
America is losing its position of global educational leadership in ways that could put our future living standards and business
competiveness
at risk.
This new information informs this position because without a change in focus to technical education those unfilled jobs will continue to increase.
In the PBS
Newshour
video, it gives examples of students who graduated with liberal arts degrees.
All of them had a difficult time gaining employment directly related to their field of study after graduation. In one instance, there was a graduate who majored in anthropology and he now washes trash cans part-time. In another instance, there was a graduate who majored in history with a minor in political science who is a substitute teacher one day a week. Both of the graduates agree, that they do not regret going to college although they wished they would have pursued som.
The aim of this study is to determining the factors which could affect the credit scoring to reveal the relationship between economical policies implemented in Turkey and the credit ratings given by credit scoring agencies with econometrics method along with comparisons among countries. When the countries own resources are not enaugh to finance economical growth, countries are needed for foreign investments.These foreign investments are wanted by countries as direct foreign investments or financial investments. Both kinds want to have a trust on types of economies to invest on them. For this reason it is needed to have a indicator for safety of a country to invest .The most important indicator developed for this purpose is credit rate. Thus, figures of GDP, Current Account Balance, Foreign Borrowing and Inflation of Turkey in the year of the 2000-2015 using parametric and semiparametric logit models. The semiparametric methods best fitting models using best fitting smoothing methods when the combines that best features of the parametric and nonparametric approaches when the parametric model violated. We used the data of IMF World Economic Outlook Database and IMF Article IV countries reports, Moody’s,Standart&Poors and Fitch main reports on site.
On May 9, Civic Enterprises and the Everyone Graduates Center at Johns Hopkins University, as part of the GradNation Campaign, released the 2016 Building a Grad Nation report. Released annually, the report shows detailed progress toward the GradNation goal of a national on-time graduation rate of 90 percent by 2020.
That afternoon, expert speakers and co-authors of the report – John Bridgeland, CEO and president, Civic Enterprises,Jennifer DePaoli, senior education advisor, Civic Enterprises, and Robert Balfanz, director of the Everyone Graduates Center at Johns Hopkins University School of Education – discussed where the nation and states stand on the path to 90 percent.
The webinar was moderated by Tanya Tucker, vice president of alliance engagement, America's Promise Alliance.
In addition to audience questions, topics included:
• Where the nation and states stand on reaching the 90 percent by 2020 goal
• Threats to achieving the goal
• Setting the record straight on graduation rates
• Recommendations for moving forward
Find the report at: www.gradnation.org/2016report
1. Aside from indicating whether or not we shout for joy if we get.docxmonicafrancis71118
1. Aside from indicating whether or not we shout for joy if we get a p value above or below .05, explain what we mean when we state, the relationship between the Dependent and the Independent Variable is statistical significant at the .05 level.
2. After running several different statistical tests, a researcher may get several estimations regarding the value of the relationship between a dependent and independent variable. Explain what is meant by strength of association and how that concept helps a researcher convey the usefulness of an association between variables?
3. Researchers look for narrow confidence intervals. Explain why a narrow point estimate is useful to the researcher in their ability to evaluate the quality of their research as well as convey to policy makers the accuracy of their results.
4. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4).pdf, provide a clear explanation for what the R-Square value of .435 means in this case. Remember a full explanation defines r-square, interprets the value, and contextualizes the meaning of this value for future readers of your results.
5. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-1.pdf provide a clear explanation for the meaning of the Global F test for significance which in this case is 0.000. Remember a full explanation defines statistical significance in this case, interprets the value, and contextualizes the meaning of this value for future readers of your results.
6. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-2.pdf provide a clear explanation for the meaning of the unstandardized b coefficient for 'Receives job training" (-0.010) and its significance value of 0.010. Remember a full explanation defines the concept of unstandardized b coefficient, interprets, statistical significance, interprets the value of the unstandardized b coefficient, and contextualizes the meaning of this value for future readers of your results.
7. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-3.pdf Provide a clear explanation for the meaning of the unstandardized b coefficient for 'Number of Dependents" (0.000) and its significance value of 0.802. Remember a full explanation defines the concept of unstandardized b coefficient, interprets, statistical significance, interprets the value of the unstandardized b coefficient, and contextualizes the meaning of this value for future readers of your results.
8. Using your Berman/Wang Workbook p. 90 Critical Thinking Exercise Two Output: Exam Two Berman Wang Critical Thinking Exercise (4)-4.pdf provide a clear explanation for the meaning of the unstandardized b for"Medical Condition" (0.013) and its significance value of 0..
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELSIJDKP
Tuition plays a significant role in determining whether a student could afford higher education, which is
one of the major driving forces for country development and social prosperity. So it is necessary to fully
understand what factors might affect the tuition and how they affect it. However, many existing studies on
the tuition growth rate either lack sufficient real data and proper quantitative models to support their
conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to
make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the
tuition growth rate by use of large amounts of authentic data and different quantitative methods such as
clustering and regression models.
2013 YRBS Data User’s GuideYouth Risk Behavior Surv.docxvickeryr87
2013 YRBS Data
User’s Guide
Youth Risk Behavior Surveillance System (YRBSS)
June 2014
Where can I get more information? Visit www.cdc.gov/yrbss or call 800−CDC−INFO (800−232−4636).
www.cdc.gov/yrbss
2 0 1 3 Y R B S D a t a U s e r ’ s G u i d e
Introduction to the YRBSS
Introduction The YRBSS was developed in 1990 to monitor priority health risk behaviors
that contribute markedly to the leading causes of death, disability, and social
problems among youth and adults in the United States. These behaviors, often
established during childhood and early adolescence, include
Behaviors that contribute to unintentional injuries and violence.
Sexual behaviors that contribute to unintended pregnancy and sexually
transmitted infections, including HIV infection.
Alcohol and other drug use.
Tobacco use.
Unhealthy dietary behaviors.
Inadequate physical activity.
In addition, the YRBSS monitors the prevalence of obesity and asthma.
From 1991 through 2013, the YRBSS has collected data from more than 2.6
million high school students in more than 1,100 separate surveys.
Uses of YRBSS The YRBSS was designed to
Results
Determine the prevalence of health risk behaviors.
Assess whether health risk behaviors increase, decrease, or stay the
same over time.
Examine the co-occurrence of health risk behaviors.
Provide comparable national, state, territorial, tribal, and local data.
Provide comparable data among subpopulations of youth.
Monitor progress toward achieving the Healthy People objectives and
other program indicators.
Components of
the YRBSS
The YRBSS includes national, state, territorial, tribal government, and local
school-based surveys of representative samples of 9th through 12th grade
students. These surveys are conducted every two years, usually during the
spring semester. The national survey, conducted by CDC, provides data
representative of 9th through 12th grade students in public and private
schools in the United States. The state, territorial, tribal government, and local
surveys, conducted by departments of health and education, provide data
representative of mostly public high school students in each jurisdiction.
The YRBSS also includes additional surveys conducted by CDC:
A middle school survey conducted by interested states, territories,
tribal governments, and large urban school districts.
June 2014 http://www.cdc.gov/yrbss Page 1
2 0 1 3 Y R B S D a t a U s e r ’ s G u i d e
A 2010 study to measure physical activity and nutrition-related
behaviors and determinants of these behaviors among a nationally
representative sample of high school students.
A series of methods studies conducted in 1992, 2000, 2002, 2004, and
2008 to improve the quality and interpretation of the YRBSS data.
The National Alternative High School Youth .
Assessing the costs of public higher education in the commonwealth of virgini...Robert M. Davis, MPA
Part 4 in a series of whitepaper research examining the costs of public higher education in the Commonwealth of Virginia. Loan borrowing has become the means in which to cope which costs increases. Loan borrowing may be one of the primary options available to finance the costs of higher education, there are risks associated with this option; recent research identifies that those risks may be growing.
Ways and Means Committee review of previous budgets and metrics evaluation. Some comparisons are made with neighboring towns. Submitted to the administration and the school board.
1. THE ANALYSIS OF
STUDENT DEBT
A deeper look into the difference of student debt in the years of 2004
and 2014.
Nina Satasiya and Priya Chandrashekar
Spring 2016 | We pledge that on our honor that we have neither given nor received
aid on this assignment.
2. Introduction
Brief Overview:
Attending college and obtaining a degree is an important investment. For many, the
burden of this investment lies on the parents who support their children's educations; however,
for some, the burden of obtaining higher education lies with the student. With this project, we
would like analyze our data to draw conclusions and make predictions about the future student
debt in the United States. Furthermore, through this project, we hope to not only analyze the past
of this issue, but to also begin to brainstorm on how to help those students in the future in
obtaining higher education without this large financial strain.
The data/calculations found within our given dataset itself are from The Institute for
College Access & Success (TICAS).1
These calculations/data within our dataset is based on data
that was collected from the U.S. Department of Education, National Center for Education
Statistics, and Integrated Postsecondary Education Data System (IPEDS) and Peterson's
Undergraduate Financial Aid and Undergraduate Databases (copyright 2015 Peterson's, a Nelnet
company, all rights reserved).
To thoroughly understand what the data looks like, one must actually visualize the data.
The entire dataset contains of data for 2004 and 2014; for the purpose of easily carrying out tests
& identifying data points, we have separated the dataset into two smaller subsections – one data
file for 2004 and another for 2014. The first 10 observations in all three data files can be
observed in Appendix A. Additionally, to get a clear understanding of the data’s normality and
overall distribution, we have included histograms and Normal Plots in Appendix B.
Terminology:
Overall, our dataset portrays a great deal of data including:
• State: The specific state of the United States the data reflects.
• Percent Change: The percentage of change of average debt from 2004 to 2014.
• Robustness: The robustness in 2004 and 2014 explains the validity of the data in each
state. The variables here are categorical with three levels named Strong, Medium, and
Weak.
o The robustness variable was determined by examining what share of each
graduating class came from colleges that reported student debt data in both years.
For states where this share was at least two-thirds in both years, the robustness of
3. the change over time was categorized as Strong; where this share was at least half
in both years but less than two-thirds in at least one of the two years, it was
categorized as Medium; and for the remaining states it was categorized as Weak.
• Average Debt 2004/Average Debt 2014: The average debt of those with loans from the
class of 2004/2014, respectively.
• Percent Debt 2004/2014: The percentage of graduates with debt in 2004/2014,
respectively.
• Percent Represented 2004/2014: The percentage of graduates represented using the
collected data.
Results & Analysis - Hypothesis Testing
Note:
Prior to running any tests, we wanted to confirm the normality of our data. We did so by
observing the histograms and Q-Q plots for Distribution of Average Student Debt, Percentage of
Students in Debt, and the Percentage of Students Represented, for both years of 2004 and 2014
(Appendix B). Once we confirm that these distributions are normally distributed, we can then
proceed to use various hypothesis tests to analyze the data.
Note that the following tests deal with two different years of data (of 2004 and 2014), and
hence, two completely different datasets. Thus, only one dataset may be used at times, and at
others, both (2004 and 2014) may be used for our exploratory analysis. Be cautious of what
dataset is being used during the tests. Furthermore, these two years are distinguishingly different
from one another. Recall that the Great Recession, a huge economic downturn, began in late
2007 and lasted until mid-2009. Hence, analyzing data prior & after this given timeframe will
lead to rather interesting and signifying results.
1-Sample Test
1-Sample T-Test
Here, we will be testing the variables of percentage of students represented in 2004 and
the percentage of students represented in 2014. We chose to use a 1-sample t test because the
population standard deviation is unknown. Furthermore, the t-test’s assumptions are fulfilled as
the percent represented data is approximately normal and symmetric for both 2004 and 2014
(Appendix B). Testing “percent represented” gives us a perspective for if the data analysis we
perform can be applicable to the population. According to BLANK2
, the average percent
4. represented of the US population in student debt studies is 83%. Therefore, 0.83 is used in our
null hypothesis. After the Great Recession, many students/families may have not been able to
afford postsecondary education. And so, many students may have actually declined to submit
their financial information due to personal reasons/embarrassment/etc.
Thus, we ultimately chose to perform a one tailed t-test, as the proportion of students
represented in 2014 could possibly be less than the listed average. The hypotheses are as follows:
H0: µ2014=.83 versus Ha: µ2014<.83
This results in a t-distribution with 47 degrees of freedom. The t-statistic is -1.0977, with a
corresponding p-value of 0.139. Based on this, there is insufficient statistical evidence at the 0.05
significance level to reject the null hypothesis. We fail to reject the claim that the true proportion
of student represented in 2014 is equal to 0.83
2-Sample Dependent Test
Non-applicable
Our dataset conveys a numerous amount of information pertaining to student debt (i.e.:
percent in debt, average debt, percent represented, etc) for the years of 2004 and 2014. Although
the same type of data is reflected in both respective years, the data itself was not collected from
the same subjects for both years.
Dependent statistical tests typically require that the same subjects are tested more than
once. Thus, "related groups" indicate that the same subjects are present in both groups. This is
because each subject has been measured on two occasions for the same dependent variable. So
essentially, data is collected from the same subjects/individuals over different time
periods/conditions. This is not the case for our dataset at hand.
Data for our dataset was collected once in 2004 and again in 2014; this information came
from recent college graduates from their respective graduating years. Hence, it is highly
unlikely/nearly impossible that the same subjects are present in both groups. This indicates that
the data is not dependent, and so, we are unable to perform any two sample dependent tests.
2-Sample Independent Tests
2-Sample F-Test
For this test, we are comparing the variances between average debt in 2004 and in
2014. Again, the distribution for these variables are independent and approximately normal so
the F-Test may be used (Appendix B). Analyzing the variances of these variables would help
5. further evaluate the spread of the data. In turn, this would reveal whether average debt
throughout the United States varies more drastically in one year compared to the other. We
believe that after the Great Recession, U.S. families would have a greater range of financial
conditions. And so, our alternative hypothesis is testing whether the average debt in 2014 had a
greater variance than the average debt in 2004. The two hypotheses are as follows:
H0: σ2004
2
= σ2014
2
versus Ha: σ2004
2
>σ2014
2
This results in a f-distribution with 47 degrees of freedom. The f-statistic is 1.8609, with a
corresponding p-value of 0.01782. Based on this, there is sufficient statistical evidence at the
0.05 significance level to reject the null hypothesis. We can reject the claim that the population
variances of 2004 and 2014 are equal to one another.
2-Sample T-Test
In this test, we are analyzing the average percentage of students in debt in 2004 versus
the average percentage of students in debt in 2014. Our population standard deviations are
unknown, and so, we chose to perform a 2-sample t test. Our distributions for percent debt are
approximately normal (Appendix B), and so, the assumptions of this test are fulfilled. The 2-
sample t-test will compare the total average percent debt of 2004 and the total average percent
debt of 2014.
This will analyze whether the average percent debt has grown from 2004 to 2014. If it
has, it will confirm our intuition that student debt has increased, as a larger proportion of students
are in debt as time moved on. Thus, our alternative hypothesis is testing whether the total
average percent debt is greater in 2014 than in 2004. Not that µ2004 indicates the average percent
debt in 2004 and µ2014 reflects the average percent debt in 2014. The hypotheses are as follows:
H0: µ2004=µ2014 versus Ha: µ2004<µ2014
This results in a t-distribution. According to statistical software, there is 86.797 degrees of
freedom; we may also consider there to be 94 degrees of freedom (n1 + n2 - 2). The t-statistic is -
2.3037, with a corresponding p-value of 0.01181. Based on this, there is sufficient statistical
evidence at the 0.05 significance level to reject the null hypothesis. We can reject the claim that
the average percent debt of students in 2004 is equal to the average percent debt of students in
2014.
Categorical Tests
Chi Square
6. Here we are comparing robustness with percent represented in our dataset using the chi
square test. Our variable robustness is categorical (i.e. we have counts) and we have transformed
our percent represented variable to be of factors (so it will be categorical as well). Thus, the chi
square test may be used. In our 2x3 contingency table, the columns are the different levels of
robustness (Strong, Medium, Weak) while the rows are the two separate ranges (0-70%, 70-
100%) of the percent represented data (Appendix A). This test will analyze whether the count of
each level of robustness vary between the various ranges of percentage of students represented.
Furthermore, this test can be performed because both variables adhere to the assumptions of
normality, independence, and the categorical nature of the variables (Appendix B).
The analysis will show whether there is a correlation between the robustness of the data
and the percentage of students represented in each state. We hope to see an association between
these two variables as it would add to the reliability and credibility of the entire dataset. Based on
this, our alternative hypothesis is whether there was association between robustness and percent
represented. The hypotheses are as follows:
H0: Robustness and Percent Represented are independent of one another
versus
Ha: Robustness and Percent Represented have association
This results in a chi square-distribution with 2 degrees of freedom. The test statistic, X-squared,
is 30.532, with a corresponding p-value of 2.344e-07. Based on this, there is sufficient statistical
evidence at the 0.05 significance level to reject the null hypothesis. We can reject the claim that
Robustness and Percent Represented do not have any relationship with one another.
1-Sample Proportion Z-Test
For this test, we are testing the strong category of robustness. Since the
proportions/counts of all levels of robustness are the same for 2004 and 2014, we only need to do
this test once and it will apply to both years. The results of this Z-test for proportions will show
whether the majority of the data has a strong level of robustness. This will be helpful as we
would like to see our data have a strong level of robustness to indicate credibility and reliability
of results (and of the dataset in general). So, we will be comparing our proportion of strong to the
null hypothesis of p=0.5 to see if more than half of our data has a strong robustness.
Furthermore, since the data is an SRS and independent and {48(0.5) ≥ 10} and {48(1-0.5) ≥ 10}
7. hold true, the assumptions for this test are fulfilled and the test can be carried out in good faith.
The hypotheses are as follows:
H0: ps = 0.5 versus Ha: ps > 0.5
This results in a normal distribution. By software, x-squared is 0.0833, with a corresponding p-
value of 0.3864 and a d.f. value of 1. By hand, the z-statistic is 0.2886751, with a corresponding
p-value of 0.386415. Based on this, there is insufficient statistical evidence at the 0.05
significance level to reject the null hypothesis. We fail to reject the claim that the population
proportion of strong robustness is equal to 0.50.
Results & Analysis – Modeling
Note:
Since our data is not collected over time, we will be conducting the following two tests:
Multiple Linear Regression (with continuous and dummy variables) and Binary Logistic
Regression. Furthermore, since only fitting models can be of little help, we randomly split our
data into training (80%) & testing (20%) datasets (Appendix C). In the following sections,
multiple linear regression and binary regression, we will be using the training dataset to fit the
model. Then, we will use the testing dataset to see how accurate the model is from the data not
used to determine the fit.
Multiple Linear Regression (continuous and dummy variables):
Addressing Dummy Variables
In this section, we attempted to analyze the relationship between the percentage of
change in student debt (in 2004 and 2014) and average student debt, percentage of students
represented, and the robustness of the data. Note that two of the explanatory variables are
continuous (average student debt and percentage of students represented), and one is categorical
(robustness). In order to include a categorical variable in a multiple regression model, a few
additional procedures are needed to ensure that the results are reliable and interpretable. This
primarily includes re-coding the categorical variable into various dichotomous variables or
“dummy variables.”
Robustness has three levels; strong, medium, and weak. Thus, two dummy variables were
made to represent the information contained in the single categorical variable. We inferred that it
would have been a good idea to make one of the extremes, strong or weak, the baseline group.
We ended up choosing strong to be the baseline group because it was the most prominent
8. robustness level in the data set. We assigned a dummy variable for each other group and
organized this in a 3x2 contrast matrix (Appendix A).
Assumptions
Prior to using a regression model, several assumptions need to be addressed. First off, the
correct variables need to be present. In this case, they are; all explanatory variables are
quantitative or categorical with at least two categories and the response variable is also
quantitative. Furthermore, residuals must be independent, have linearity, constant variance, and
normality. The proper graphics displaying/confirming these assumptions are seen in Appendix C.
Maximal Population Model/Variables & Terminology
Ultimately, we hope that our model’s explanatory variables and interaction terms (both
continuous and dummy) are used to predict the percentage of change in student debt. Our
maximal model is as follows: ! = #$ + #&'( + #)'* + #+', + #-'. + #/ '* ∗ '( + #1('( ∗ ',) +
#4('( ∗ '.) + #5('* ∗ ',) + #7('* ∗ '.) + #&$(', ∗ '.) + #&&('( ∗ '* ∗ ',) + #&)('( ∗ ', ∗ '.) +
#&+('( ∗ '* ∗ '.) + #&-('( ∗ '* ∗ ', ∗ '.) + 8
Where:
• #$ refers to the intercept of the regression line; it is the estimated mean response value
when all the explanatory variables have a value of 0
• #&, #), #+, #-, #/, #1, #4, #5, #7, #&$, #&&, #&), #&+ , #&- are the respective regression
coefficients. They are the change in the mean student debt (%) relative to their respective
explanatory variables/interaction terms; ultimately, they are the expected difference in
response per unit difference for their respective predictors, all other things being
unchanged.
• Xa, is average debt
• Xp, is percent represented
• Xw, is weak robustness
• Xm, is medium robustness
The variable Average Debt will reflect on how average debt looked holistically in each year.
We decided to include this variable in the model as we thought it would surely be a variable that
would have a strong, direct relationship with the percentage of change in student debt. Average
debt usually fluctuates greatly over time. Hence, it would correlate directly with our response
variable. Thus, we believed that this variable will be beneficial to include in the model.
9. We hope that Percent Represented will give us insight into how reliable the data is. This
variable is clearly important in holistic analysis of the dataset. However, we are skeptical on the
extent of its influence in the model and on the response variable. By including this variable, we
will see if it truly benefits the model. If not, we would like to see it removed.
The variable Robustness is categorical. It focuses on the reliability of the data by
determining if the colleges that reported data/information were consistent in 2004 and 2014. This
variable gives useful information in regards to the credibility and reliability of the dataset itself.
So, we assumed it would be very beneficial to include in our maximal model; we hope to see it
ultimately have significance.
Furthermore, Percent Change is the response variable. It allows us to see the overall trend
in the percentage of change in student debt between 2004 and 2014. This statistic is a good
summary of the overall trend we are attempting to analyze, and so, we will be using it as our
response variable.
Model Testing - Brief Overview
We began by fitting the maximal model. From there on, the process was ultimately a
repetition of “cases” where we attempted to simplify the model by removing non-significant
interaction terms/quadratic or nonlinear terms/explanatory variables. Each “case” consisted of
checking the ANOVA and t-tests for the model at hand. We first checked the ANOVA to see if
the overall model (all slopes collectively) had significance. Then, we sought out the t-test for
slopes (individual test for slopes) to test for coefficient significance. If we found a slope of a
variable to not be significant, then we had no need for it in the model. We would then proceed to
remove the variable (if need be), update the model, and repeat this procedure with the new,
updated model until we came across a significant/desired model. Hence, this process can be
considered a repetition of cases, where each “case” is a new, updated model.
Model Testing - Data Discussion
We repeated this process eight times before arriving to the final, simplest and most
significant model. When we arrived to the seventh model (our second to last model), we
checked the ANOVA for overall significance. The hypotheses were as follows: H0: the model is
insignificant versus Ha: The model is not insignificant. The F test-statistic in this case was 1.851
on 2 and 73 d.f. The ANOVA returned a p-value of 0.1644, and so, we failed to reject the null
10. hypothesis. Thus, there is insufficient statistical evidence at the 0.05 significance level to reject
the claim that the model (all slopes collectively) is insignificant.
We then proceeded to take a closer look at the slopes (t-test for slopes). We had already
examined all 3-way and 2-way interactions. Now, we could focus on the single, explanatory
variables. The hypotheses for this test were as follows: H0 : #;<=>?@AB?? = 0 versus
Ha: #;<=>?@AB?? ≠ 0 . The t test-statistic for “Robustness” was 0.616 with 71 d.f. The t-test for
slopes returned a p-value of 0.540, and so, we failed to reject the null hypothesis. Thus, there is
insufficient evidence at the 0.05 level to reject the claim that the regression coefficient equals 0.
Since this term was insignificant, we removed it, updated our model, and continued to our eighth
model at hand. It turned out that the eighth model ended up being the last.
Final Model
After the seventh model, all of our terms had been dropped. So, our final model was
simply Percent_ Change = 0.56934. In other words, the model simply equals the intercept point.
This model indicates that a unit change in any predictor variables results in no change in the
predicted value of outcome, or percentage of change in student debt.
All of the coefficients proved to be insignificant, reiterating the fact that the overall
model is insignificant. We would have liked to get a significant/good model or at least one that
had some variables... After many attempts of making new models with new explanatory
variables/response variables, we had no luck in finding a significant model. None of the variables
in our dataset resulted in a significant model. There may have been multicollinearity between
some of the explanatory variables that could have lead to these results. Additionally, our sample
size may have been too small in respect to the number of explanatory variables. Nevertheless, we
still attempted to fit a model to our dataset.
Accuracy
Our line of best fit for the final model is Y=0.56934, with an R Squared and Adjusted R
Squared value of 0. This indicates that 0% of the variation in the response variable can be
explained by the variation in the explanatory variables. Our value for SSE is 0.02052, which
indicates the residual variation/unexplained variation. After finding these statistics for our final
model, we attempted to fit it to the testing data set. Ultimately, we found that our fitted model
was not a good representation of the data. The plot in Appendix D portrays the differences in the
11. observed and predicted values from the training dataset. As seen, the differences were
substantial, reiterating the poor fit the training model provides.
Binary Logistic Regression:
Addressing Dummy Variables
Robustness has three levels; strong, medium, and weak. Thus, two dummy variables were
made to represent the information contained in the single categorical variable. We inferred that it
would have been a good idea to make one of the extremes, strong or weak, the baseline group.
We ended up choosing strong to be the baseline group because it was the most prominent
robustness level in the data set. We assigned a dummy variable for each other group and
organized this in a contrast matrix (Appendix A).
We decided to make Years into a categorical variable. The two levels are 2004 and 2014.
Half of the data set is from 2004, and the other half is from 2014. We’re interested in seeing if
the year 2014 has greater student debt than 2004. Thus, we decided to make 2004 our baseline
and 2014 our dummy variable. We assigned a dummy variable for the other group and organized
this into a contrast matrix (Appendix A).
Assumptions
Prior to using a regression model, several assumptions need to be addressed. First off,
there must be linearity. We need to assume that there is a linear relationship between any
continuous predictors and the logit of the outcome variable. We can assume that there is
independence of errors and the data is independently distributed. Furthermore, by looking at the
data, we can infer that there is indeed a linear relationship.
Maximal Population Model/Variables & Terminology
Ultimately, we hope that our model’s explanatory variables and interaction terms (both
continuous and dummy) are used to predict the probability of Robustness (with the level
“Strong” being the baseline) occurring in the dataset given the known values of our explanatory
variables.
Our maximal model is as follows: J !K<=>?@AB?? =
&
(&LBM NOPNQRQPNSRSPNTRTPNURUPNVRV )
,
where:
• W$ is a constant
o X=O= the odds that the characteristic is present in an observation i when Xi = 0,
i.e., at baseline.
12. • W&, W), W+, W-, W/ are the respective regression coefficients
o X=Y= for every unit increase in their respective Xi, the odds that the characteristic
is present is multiplied by X=Q; this is an estimated odds ratio
The explanatory variables (both continuous & dummy) are used to predict J(!;<=>?@AB??), the
probability of Robustness, where:
• X1 is average debt
• X2 is percentage of students in debt
• X3 is percentage of student represented
• X4 percentage of change in student debt from 2004 to 2014
• X5 is the dummy variable 2014
The variable Average Debt will reflect on how average debt looked holistically in each
year. We decided to include this variable in the model as we thought it would provide more
information about the trend we are attempting to analyze. Average debt usually fluctuates greatly
over time. Hence, it would have an association with our response variable, Robustness. Thus, we
believed that this variable will be beneficial to include in the model.
The variable Percent Debt is the percentage of students in Debt in each year. It is an
informative variable to have when looking at the dataset in its entirety. However, we are unsure
as to how much influence this variable will have on the model. If anything, we would like to see
it removed if it is of no benefit.
We hope that Percent Represented will give us more insight into how reliable the data is.
This variable is clearly important in holistically analyzing the dataset. Thus, we are interested on
the extent of its influence in the model and on the response variable. We believe that this variable
at Robustness will also have a close association. We would like to see if it truly is beneficial to
the model.
We believe Percent Change would be a valuable variable to include in our model. It
allows us to see the overall trend in the percentage of change in student debt between 2004 and
2014. This statistic is a good summary of the overall trend we are attempting to analyze, and so,
we hope it will be beneficial to keep in the model.
Years is a categorical variable. Half of the data set is from 2004, and the other half is
from 2014. We’re interested in seeing if the year 2014 has greater student debt than 2004. Thus,
we decided to make 2004 our baseline and 2014 our dummy variable.
13. Our response variable is Robustness, with the level “Strong” being the baseline.
categorical. It focuses on the reliability of the data by determining if the colleges that reported
data/information were consistent in 2004 and 2014. This variable gives useful information in
regards to the credibility and reliability of the dataset itself. We chose this as our response
variable as we would like to see a majority of our data having a Strong level of Robustness.
Model Testing - Overview
We began by fitting the maximal model using the training dataset. After fitting the model,
there began a process of “cases” where we attempted to simplify the model by removing non-
significant interaction terms/quadratic or nonlinear terms/explanatory variables. Each “case”
consisted of checking the likelihood ratio and Wald Test for the model at hand.
First off, we checked the likelihood ratio between our first model and the baseline/null
model. The likelihood ratio test checks for model significance. From this first point, we found
that our model is significantly different than the baseline model. Hence we continued on to
simplify and improve our model through the Z-test for Individual Components (Wald Test).
Furthermore, we would constantly conduct a likelihood ratio test after each new model to assess
its improvement. Ultimately, this process can be considered a repetition of cases, where each
“case” is a new, updated model.
Model Testing – Data Discussion
We repeated this process three times before arriving to the final, simplest and most
significant model. When we arrived to the second (second to last) model, we checked its
likelihood ratio to assess its significance. The hypotheses were as follows: H0: the model is not
significantly different than the previous model (model 1) versus Ha: The model is significantly
different than the previous model (model 1). The chi square test-statistic in this case was 0.41713
on 1 d.f. It had a respective p-value of 0.5183726, and so, we failed to reject the null hypothesis.
Thus, there is insufficient statistical evidence at the 0.05 significance level to reject the claim that
the model is significantly different than the previous model. Although this model was not
significantly different, it was more simple, and so, we could use it.
We needed to look at the model some more to see if we could improve it. So, we
proceeded to take a closer look at the explanatory variables through the Wald test. If we found an
explanatory variable to be insignificant, we would have no need for it in the model. The
hypotheses for this test were as follows: H0 : #ZB;[BA@ ](A^B = 0 _X`aba Ha: #ZB;[BA@ ](A^B ≠
14. 0 . The z test-statistic for “Percent Change” was 0.146 with a p-value of 0.883858. So, we failed
to reject the null hypothesis. Thus, there is insufficient evidence at the 0.05 level to reject the
claim that this regression coefficient equals 0. Since this term was insignificant, we removed it,
updated our model, and continued to our third model at hand. It turned out that the third model
ended up being the last.
Final Model
After the second model, the terms “Year” and “Percent Change” had been dropped. So,
our final model was simply : J !K<=>?@AB?? =
&
(&LBM NOPNQRQPNSRSPNTRT )
. It is the simpler than
the previous models and only contains significant explanatory variables.
Accuracy
After fitting our final model to the training data set we found its accuracy. Unfortunately,
we got an accuracy value of 0. This accuracy is not ideal but we can attribute this to the fact that
our sample size is rather small. Furthermore, the data depends on the random splitting of the data
into testing and training sets. Our results may as the data sets change with different random
sampling. Thus, with a different sample, we could get better or worse results.
Odds Ratios
We then found the odds ratio for our significant explanatory variables from our ultimate
model. The odds ratio for percent debt is 8.818586e-07. This entails that the odds that robustness
increases as percent debt increases is not likely because the value is below 1. The odds ratio for
percent represented is 1.916972e-09. This value, like that of percent debt, reveals that the odds
that robustness increases as percent represented increases is not likely because the value is below
1. The odds ratio for average debt is 1.000153e+00. Since this value is slightly above 1 it can be
deduced that the odds that robustness increases as average debt increases is likely.
19. Works Cited
1
"Project on Student Debt." State by State Data. The Institute For College Access and Success,
2015. Web. 18 Feb. 2016. http://ticas.org/posd/map-state-data-2015#
2
2014, November. STUDENT DEBT AND THE CLASS OF 2013 (n.d.): 22. Web.