A modelling approach to establish whether or not there is a north-south divide in the UK in terms of home ownership. Data used included UK Census and UK Quarterly Labour Force Survey
Grade: 78%
The relationship between the Stock Market and Interest RatesGaetan Lion
This is a study of the relationship between the Stock Market and Interest Rates. We review how the Stock Market has reacted when interest rates rise. We also factor the influence of other macroeconomics variables.
This document analyzes the value of the minimum wage in the United States from 1947 to 2005 using correlation and regression analysis. It finds a strong positive correlation between years and nominal minimum wage but a weak correlation between years and minimum wage adjusted for inflation. There is a moderate negative correlation found between years and minimum wage as a share of average private wages. Regression equations are presented for the relationships between years and nominal wage, inflation-adjusted wage, and wage share. The analysis concludes that while the nominal minimum wage has increased over time, when accounting for inflation it has lost value, and its value relative to average wages has decreased.
Dear students, get latest Solved NMIMS assignments and case study help by professionals.
Mail us at : help.mbaassignments@gmail.com
Call us at : 08263069601
This document analyzes county-level poverty rates in California and Oregon in 2015. It tests two hypotheses: 1) counties with higher percentages of black residents will have higher poverty rates, and 2) counties with higher percentages of bachelor's degree holders will have lower poverty rates. Simple regressions show the percentage of black residents has a weak positive correlation with poverty, while the percentage with bachelor's degrees has a strong negative correlation. A multiple regression controlling for both variables explains over 56% of the variation in poverty rates across counties. Additional socioeconomic and demographic variables are also analyzed to further explain poverty rate differences at the county level.
The document summarizes forecasts of quarterly e-commerce retail sales for 2015 Q4 through 2016 Q3 using three methods: Winter's Smoother, Time Series Decomposition, and AR(4) model. Based on pseudo MAPE errors, Winter's Smoother and AR(4) produced the best results. The Winter's Smoother accounted for trend and seasonality and had the lowest error rates. The AR(4) model used lagged sales data and also had relatively low errors. Time Series Decomposition followed trends but consistently underestimated sales, yielding a higher error rate.
This document discusses the relationship between income inequality and economic growth in developing countries. It uses data from 50 countries in 2008 to estimate two regression models: a simple linear regression of GDP per capita growth on the Gini index, and a log-linear regression of GDP per capita growth on the log of the Gini index. Both models find a statistically significant negative relationship, such that higher inequality is associated with lower economic growth. However, the significance is only at the 10% level. The log-linear model has a slightly higher R-squared and fits the graphical data better. Diagnostic tests find no evidence of heteroskedasticity or autocorrelation in either model. The document concludes the models provide a basic estimate of the
This document summarizes a hedonic home price prediction model developed by Phil Fargason and Jianting Zhao for Zillow. They collected 23 variables related to home characteristics, location, neighborhood attributes, crime, transportation and demographics. Their linear regression model explained 70% of variation in home prices in San Francisco with a mean absolute percentage error of 25%. Key factors correlated with higher prices included property size, number of bedrooms/bathrooms, proximity to transit and colleges, and surrounding home prices.
Operations Management in the Supply Chain Decisions and Cases 7th Edition Sch...Dorianner
This document discusses a case study involving Lawn King, a manufacturer of lawn mowers facing seasonal demand. Management has just increased its demand forecast for the coming year, causing them to evaluate forecast accuracy and develop production strategies. Students are asked to develop a forecast, construct alternative monthly production plans using different strategies like level, chase or overtime, and recommend a strategy. Careful analysis and use of Excel is required to evaluate the options and tradeoffs involved in sales and operations planning for this seasonal business.
The relationship between the Stock Market and Interest RatesGaetan Lion
This is a study of the relationship between the Stock Market and Interest Rates. We review how the Stock Market has reacted when interest rates rise. We also factor the influence of other macroeconomics variables.
This document analyzes the value of the minimum wage in the United States from 1947 to 2005 using correlation and regression analysis. It finds a strong positive correlation between years and nominal minimum wage but a weak correlation between years and minimum wage adjusted for inflation. There is a moderate negative correlation found between years and minimum wage as a share of average private wages. Regression equations are presented for the relationships between years and nominal wage, inflation-adjusted wage, and wage share. The analysis concludes that while the nominal minimum wage has increased over time, when accounting for inflation it has lost value, and its value relative to average wages has decreased.
Dear students, get latest Solved NMIMS assignments and case study help by professionals.
Mail us at : help.mbaassignments@gmail.com
Call us at : 08263069601
This document analyzes county-level poverty rates in California and Oregon in 2015. It tests two hypotheses: 1) counties with higher percentages of black residents will have higher poverty rates, and 2) counties with higher percentages of bachelor's degree holders will have lower poverty rates. Simple regressions show the percentage of black residents has a weak positive correlation with poverty, while the percentage with bachelor's degrees has a strong negative correlation. A multiple regression controlling for both variables explains over 56% of the variation in poverty rates across counties. Additional socioeconomic and demographic variables are also analyzed to further explain poverty rate differences at the county level.
The document summarizes forecasts of quarterly e-commerce retail sales for 2015 Q4 through 2016 Q3 using three methods: Winter's Smoother, Time Series Decomposition, and AR(4) model. Based on pseudo MAPE errors, Winter's Smoother and AR(4) produced the best results. The Winter's Smoother accounted for trend and seasonality and had the lowest error rates. The AR(4) model used lagged sales data and also had relatively low errors. Time Series Decomposition followed trends but consistently underestimated sales, yielding a higher error rate.
This document discusses the relationship between income inequality and economic growth in developing countries. It uses data from 50 countries in 2008 to estimate two regression models: a simple linear regression of GDP per capita growth on the Gini index, and a log-linear regression of GDP per capita growth on the log of the Gini index. Both models find a statistically significant negative relationship, such that higher inequality is associated with lower economic growth. However, the significance is only at the 10% level. The log-linear model has a slightly higher R-squared and fits the graphical data better. Diagnostic tests find no evidence of heteroskedasticity or autocorrelation in either model. The document concludes the models provide a basic estimate of the
This document summarizes a hedonic home price prediction model developed by Phil Fargason and Jianting Zhao for Zillow. They collected 23 variables related to home characteristics, location, neighborhood attributes, crime, transportation and demographics. Their linear regression model explained 70% of variation in home prices in San Francisco with a mean absolute percentage error of 25%. Key factors correlated with higher prices included property size, number of bedrooms/bathrooms, proximity to transit and colleges, and surrounding home prices.
Operations Management in the Supply Chain Decisions and Cases 7th Edition Sch...Dorianner
This document discusses a case study involving Lawn King, a manufacturer of lawn mowers facing seasonal demand. Management has just increased its demand forecast for the coming year, causing them to evaluate forecast accuracy and develop production strategies. Students are asked to develop a forecast, construct alternative monthly production plans using different strategies like level, chase or overtime, and recommend a strategy. Careful analysis and use of Excel is required to evaluate the options and tradeoffs involved in sales and operations planning for this seasonal business.
The document analyzes the relationship between changes in GDP/GNP and unemployment rates in the United States from 1948-2014 using Okun's Law. Two regression models were estimated, with GDP% and GNP% as the dependent variables and UNRATE% as the independent variable. Both models showed a statistically significant negative relationship, with the GDP model explaining 43% of variation in GDP. The author concludes that in the US, a 1% increase in unemployment is associated with a 0.07% decrease in GDP growth.
Explaining the characteristics underpinning the Brexit vote across different parts of the UK, by Resolution Foundation's Stephen Clarke and Matthew Whittaker.
01 Descriptive Statistics for Exploring Data.pdfSREDDINIRANJAN
This document discusses the importance of descriptive statistics and various methods for visually summarizing data, including histograms, boxplots, scatterplots, and others. It explains that descriptive statistics communicate information and support reasoning about data. Graphical summaries like histograms can show the density and relative frequencies of data, while boxplots convey less information but take up less space to compare multiple datasets. Context is also important for graphical integrity.
1 BBS300 Empirical Research Methods for Business .docxoswald1horne84988
1
BBS300 Empirical Research Methods for Business
TSA, 2018
Assignment 1
Due: Sunday, 7 October 2018,
23:55 PM
This assignment covers material from Sessions 1-4 and is worth 20% of your total mark
of BBS300. Your solutions should be properly presented, and it is important that you
double-check your spelling and grammar and thoroughly proofread your assignment
before submitting. Instructions for assignment submission are presented in
the “Assignment 1” link and must be strictly adhered to. No marks will be
awarded to assignments that are submitted after the due date and time.
All analyses must be carried out using SPSS, and no marks will be awarded
for assignment questions where SPSS output supporting your answer is not
provided in your Microsoft Word file submitted for the Assignment.
Questions
In this assignment, we will examine the “Real Estate Market” dataset (described at the
end of the assignment ) and “Employee Satisfaction” dataset. Before beginning the
assignment, read through the descriptions of these dataset and their variables carefully.
The “Real Estate Market” dataset can be found in the file “realestatemarket.sav,” and
the “Employee Satisfaction” dataset can be found in the file “employeesatisfaction.sav.”
You will need to carefully inspect both SPSS data files to be sure that the
specification of variable types is correct and, where appropriate, value
labels are entered.
1. (12 marks)
2
Use appropriate graphical displays and measures of centrality and dispersion
to summarise the following four variables in the “Real Estate Market” dataset. For
graphical displays for numeric data, be sure to comment on not only the shape of
the distribution but also compliance with a normal distribution. Be sure to
include relevant SPSS output (graphs, tables) to support your answers.
(a) Price.
(b) Lot Size.
(c) Material.
(d) Condition.
2. (8 marks)
Again consider the variable Price, which records the property price (in AUD). It
is of interest to know if this is associated with the distance of the property is
located to the train station. It i s al so of i nter e st t o kn o w if th e p rop ert y
pri ce s are a sso ciate d with di st an ce to t h e ne ar e st b u s sto p. Carry out
appropriate statistical techniques to assess whether there is a significant
association between the property price and distance to the nearest train (To train)
station and the nearest bus stop (To bus). Be sure to thoroughly assess the
assumptions of your particular analysis, and be sure to include relevant SPSS
output (graphs, tables) to support your answers.
3. (7 marks)
Consider the “Employee Satisfaction” dataset, which asked participants to provide their
level of regularity to a series of thirteen statements. Conduct an appropriate analysis
to assess the reliability of responses to these statements. If the reliability will
increa.
Housing Affordability in Metro Atlanta: It's ComplicatedARCResearch
The document discusses various ways to measure housing affordability in metro Atlanta. It analyzes data on home prices, sales prices, the housing opportunity index, and the percentage of income spent on housing and transportation costs. While metro Atlanta has relatively affordable home prices, affordability depends on factors like income, transportation costs, and whether households are renters or owners. Maps show that areas with lower incomes and higher poverty rates also tend to have less affordable housing costs as a percentage of income.
This document summarizes the results of an analysis of factors influencing individuals' job satisfaction using panel data from the British Household Panel Survey. A fixed effects model was preferred to a random effects model based on a Hausman test. The analysis found that being married, having an improved financial situation compared to the previous year, and living outside of London were associated with higher levels of job satisfaction, while a worse financial situation was associated with lower satisfaction. Regional differences in satisfaction were also observed.
Carter Jonas New Homes Residential View - Winter 2016Lee Layton
What type of new homes are we building, where are we building them and are they the right type of property for their local market? These are three important questions that we
aim to answer in the latest edition of the Carter Jonas New Home Residential View.
This study projects the impact of population aging on future housing stock and prices in both provincial and national markets.
Mario Fortin,
Professor,
Université de Sherbrooke
This document analyzes housing data from Cook and DuPage counties in the Chicago area using hedonic regression. It finds that 54.6% of variation in home prices can be explained by attributes like number of rooms, living area, age, lot size, amenities, taxes, income, distance to downtown and an airport. The effective age of a house has a significant negative impact on price, indicating that older homes are less desirable and valuable. Variables for number of bathrooms, school spending and distance to the nearest expressway were removed from the model as insignificant predictors of home value.
- The document analyzes factors that influence housing prices in the Chicago market using hedonic regression analysis on data from 2000 homes.
- Key factors found to significantly impact price based on the regression analysis include number of rooms, living area, effective age of the home, lot size, air conditioning, property taxes, median income, distance from downtown Chicago, and whether the home was located in Cook County or DuPage County.
- Three factors - spending per student, number of bathrooms, and distance to the nearest expressway - were found to not significantly impact price based on additional regression runs and subset F tests.
Has Milwaukee\'s Riverwest neighborhood reached a condo development saturation point? What is the impact of income and job growth on the sustainability of the condo building boom in this diverse area of Milwaukee?
Non-wage income is a big component of total income in America, yet is almost never analyzed in terms of inequality and discrimination. Here we use the Tobit method to determine the likelihood of a person earning Non-Wage income.
This document examines factors that influence income inequality between countries as measured by the Gini index. Multiple regression analysis was used to model the Gini index based on GDP per capita, percentage of urban population, and tertiary education enrollment ratio. The analysis found that GDP per capita, percentage of urban population, and tertiary education enrollment ratio were statistically significant predictors of the Gini index, with GDP per capita and tertiary education enrollment associated with lower inequality and urban population associated with higher inequality.
This document summarizes an analysis of home sale prices in Lucas County, Ohio using a sample of 200 homes. It describes fitting univariate regression models with selling price as the dependent variable and square footage as the independent variable, both in linear and log-transformed forms. Summary statistics are provided showing distributions of home characteristics. Regression results are presented, showing square footage is a statistically significant predictor of sale price, explaining around 40% of variation. Diagnotic plots reveal some outliers and tendency to over/underestimate for certain home sizes.
New Homes Residential View - Autumn/Winter 2016Lee Layton
The document discusses new home construction in England. It finds that while construction levels have recovered from pre-2008 levels, completions are still around 15% lower than the pre-downturn average. Most new homes built in the last year had 3 or more bedrooms. The areas with the most new construction activity are around East Midlands and East of England, while London commuter towns lack activity despite high demand. The document also analyzes new home prices compared to existing homes in different areas.
Exploring australian economy and diversityKrishnendu Das
The document discusses analyzing population, job vacancy, and unemployment rate data for various Australian states over time. Key findings include:
- The populations of Victoria, New South Wales, and Queensland have been gradually increasing over time. Queensland has the lowest population.
- Job vacancies in Victoria have fluctuated over time, with a maximum around 72000 and minimum around 32000. A linear regression model fits the recent data better than all the data.
- The maximum unemployment rate in Victoria was 12.55% in 1993. Unemployment and job vacancies are inversely related.
- A motion chart shows unemployment rates, job vacancy rates, and populations changing over time for each state. Tasmania generally has
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
In this paper, we attempt to predict the price of a real estate individual homes sold in North West Indiana based on the individual homes sold in 2014. The data/information is collected from realtor.com. The purpose of this paper is to predict the price of individual homes sold based on multiple regression model and also utilize SAS forecasting model and software. We also determine the factors influencing housing prices and to what extent they affect the price. Independent variables such square footage, number of bathrooms, and whether there is a finished basement,. and whether there is brick front or not and the type of home: Colonial, Cotemporary or Tudor. How much does each type of home (Colonial, Contemporary, Tudor) add to the price of the real estate
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
This document describes using multiple linear regression to predict real estate prices. House price data from 480 homes sold in Indiana in 2014 is used. Independent variables like size, number of bedrooms/bathrooms, and whether there is a basement are considered. Correlations between variables are examined. An initial regression model is developed using all potential predictors. The best fitting model is found to use only homeowner association (HOA) fees as a predictor, with the equation Price=312638+17.854Hoa.
What Causes Economic Growth? A Breakdown of The Solow Growth ModelJaredBilberry1
The document summarizes an empirical study examining the Solow growth model and the augmented Solow model developed by Mankiw, Romer and Weil. The study uses data from 1960-1985 for non-oil producing countries to test the relationship between GDP per capita in 1985 and variables for investment, population growth, and secondary education. Descriptive statistics show average GDP increased from 1960 to 1985 while population and investment levels also rose. Correlation analysis found GDP correlated positively with investment and education, but negatively with population growth, supporting the models' predictions.
Spatial Autocorrelation: Impacts of Employment on Health in NW EnglandCobain Schofield
Using Spatial Autocorrelation, Moran's i, and Geographically Weighted Regression to analyse the impacts of employment on health in North West England, using 2011 Census data.
More Related Content
Similar to England's North-South Divide on Home Ownership
The document analyzes the relationship between changes in GDP/GNP and unemployment rates in the United States from 1948-2014 using Okun's Law. Two regression models were estimated, with GDP% and GNP% as the dependent variables and UNRATE% as the independent variable. Both models showed a statistically significant negative relationship, with the GDP model explaining 43% of variation in GDP. The author concludes that in the US, a 1% increase in unemployment is associated with a 0.07% decrease in GDP growth.
Explaining the characteristics underpinning the Brexit vote across different parts of the UK, by Resolution Foundation's Stephen Clarke and Matthew Whittaker.
01 Descriptive Statistics for Exploring Data.pdfSREDDINIRANJAN
This document discusses the importance of descriptive statistics and various methods for visually summarizing data, including histograms, boxplots, scatterplots, and others. It explains that descriptive statistics communicate information and support reasoning about data. Graphical summaries like histograms can show the density and relative frequencies of data, while boxplots convey less information but take up less space to compare multiple datasets. Context is also important for graphical integrity.
1 BBS300 Empirical Research Methods for Business .docxoswald1horne84988
1
BBS300 Empirical Research Methods for Business
TSA, 2018
Assignment 1
Due: Sunday, 7 October 2018,
23:55 PM
This assignment covers material from Sessions 1-4 and is worth 20% of your total mark
of BBS300. Your solutions should be properly presented, and it is important that you
double-check your spelling and grammar and thoroughly proofread your assignment
before submitting. Instructions for assignment submission are presented in
the “Assignment 1” link and must be strictly adhered to. No marks will be
awarded to assignments that are submitted after the due date and time.
All analyses must be carried out using SPSS, and no marks will be awarded
for assignment questions where SPSS output supporting your answer is not
provided in your Microsoft Word file submitted for the Assignment.
Questions
In this assignment, we will examine the “Real Estate Market” dataset (described at the
end of the assignment ) and “Employee Satisfaction” dataset. Before beginning the
assignment, read through the descriptions of these dataset and their variables carefully.
The “Real Estate Market” dataset can be found in the file “realestatemarket.sav,” and
the “Employee Satisfaction” dataset can be found in the file “employeesatisfaction.sav.”
You will need to carefully inspect both SPSS data files to be sure that the
specification of variable types is correct and, where appropriate, value
labels are entered.
1. (12 marks)
2
Use appropriate graphical displays and measures of centrality and dispersion
to summarise the following four variables in the “Real Estate Market” dataset. For
graphical displays for numeric data, be sure to comment on not only the shape of
the distribution but also compliance with a normal distribution. Be sure to
include relevant SPSS output (graphs, tables) to support your answers.
(a) Price.
(b) Lot Size.
(c) Material.
(d) Condition.
2. (8 marks)
Again consider the variable Price, which records the property price (in AUD). It
is of interest to know if this is associated with the distance of the property is
located to the train station. It i s al so of i nter e st t o kn o w if th e p rop ert y
pri ce s are a sso ciate d with di st an ce to t h e ne ar e st b u s sto p. Carry out
appropriate statistical techniques to assess whether there is a significant
association between the property price and distance to the nearest train (To train)
station and the nearest bus stop (To bus). Be sure to thoroughly assess the
assumptions of your particular analysis, and be sure to include relevant SPSS
output (graphs, tables) to support your answers.
3. (7 marks)
Consider the “Employee Satisfaction” dataset, which asked participants to provide their
level of regularity to a series of thirteen statements. Conduct an appropriate analysis
to assess the reliability of responses to these statements. If the reliability will
increa.
Housing Affordability in Metro Atlanta: It's ComplicatedARCResearch
The document discusses various ways to measure housing affordability in metro Atlanta. It analyzes data on home prices, sales prices, the housing opportunity index, and the percentage of income spent on housing and transportation costs. While metro Atlanta has relatively affordable home prices, affordability depends on factors like income, transportation costs, and whether households are renters or owners. Maps show that areas with lower incomes and higher poverty rates also tend to have less affordable housing costs as a percentage of income.
This document summarizes the results of an analysis of factors influencing individuals' job satisfaction using panel data from the British Household Panel Survey. A fixed effects model was preferred to a random effects model based on a Hausman test. The analysis found that being married, having an improved financial situation compared to the previous year, and living outside of London were associated with higher levels of job satisfaction, while a worse financial situation was associated with lower satisfaction. Regional differences in satisfaction were also observed.
Carter Jonas New Homes Residential View - Winter 2016Lee Layton
What type of new homes are we building, where are we building them and are they the right type of property for their local market? These are three important questions that we
aim to answer in the latest edition of the Carter Jonas New Home Residential View.
This study projects the impact of population aging on future housing stock and prices in both provincial and national markets.
Mario Fortin,
Professor,
Université de Sherbrooke
This document analyzes housing data from Cook and DuPage counties in the Chicago area using hedonic regression. It finds that 54.6% of variation in home prices can be explained by attributes like number of rooms, living area, age, lot size, amenities, taxes, income, distance to downtown and an airport. The effective age of a house has a significant negative impact on price, indicating that older homes are less desirable and valuable. Variables for number of bathrooms, school spending and distance to the nearest expressway were removed from the model as insignificant predictors of home value.
- The document analyzes factors that influence housing prices in the Chicago market using hedonic regression analysis on data from 2000 homes.
- Key factors found to significantly impact price based on the regression analysis include number of rooms, living area, effective age of the home, lot size, air conditioning, property taxes, median income, distance from downtown Chicago, and whether the home was located in Cook County or DuPage County.
- Three factors - spending per student, number of bathrooms, and distance to the nearest expressway - were found to not significantly impact price based on additional regression runs and subset F tests.
Has Milwaukee\'s Riverwest neighborhood reached a condo development saturation point? What is the impact of income and job growth on the sustainability of the condo building boom in this diverse area of Milwaukee?
Non-wage income is a big component of total income in America, yet is almost never analyzed in terms of inequality and discrimination. Here we use the Tobit method to determine the likelihood of a person earning Non-Wage income.
This document examines factors that influence income inequality between countries as measured by the Gini index. Multiple regression analysis was used to model the Gini index based on GDP per capita, percentage of urban population, and tertiary education enrollment ratio. The analysis found that GDP per capita, percentage of urban population, and tertiary education enrollment ratio were statistically significant predictors of the Gini index, with GDP per capita and tertiary education enrollment associated with lower inequality and urban population associated with higher inequality.
This document summarizes an analysis of home sale prices in Lucas County, Ohio using a sample of 200 homes. It describes fitting univariate regression models with selling price as the dependent variable and square footage as the independent variable, both in linear and log-transformed forms. Summary statistics are provided showing distributions of home characteristics. Regression results are presented, showing square footage is a statistically significant predictor of sale price, explaining around 40% of variation. Diagnotic plots reveal some outliers and tendency to over/underestimate for certain home sizes.
New Homes Residential View - Autumn/Winter 2016Lee Layton
The document discusses new home construction in England. It finds that while construction levels have recovered from pre-2008 levels, completions are still around 15% lower than the pre-downturn average. Most new homes built in the last year had 3 or more bedrooms. The areas with the most new construction activity are around East Midlands and East of England, while London commuter towns lack activity despite high demand. The document also analyzes new home prices compared to existing homes in different areas.
Exploring australian economy and diversityKrishnendu Das
The document discusses analyzing population, job vacancy, and unemployment rate data for various Australian states over time. Key findings include:
- The populations of Victoria, New South Wales, and Queensland have been gradually increasing over time. Queensland has the lowest population.
- Job vacancies in Victoria have fluctuated over time, with a maximum around 72000 and minimum around 32000. A linear regression model fits the recent data better than all the data.
- The maximum unemployment rate in Victoria was 12.55% in 1993. Unemployment and job vacancies are inversely related.
- A motion chart shows unemployment rates, job vacancy rates, and populations changing over time for each state. Tasmania generally has
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
In this paper, we attempt to predict the price of a real estate individual homes sold in North West Indiana based on the individual homes sold in 2014. The data/information is collected from realtor.com. The purpose of this paper is to predict the price of individual homes sold based on multiple regression model and also utilize SAS forecasting model and software. We also determine the factors influencing housing prices and to what extent they affect the price. Independent variables such square footage, number of bathrooms, and whether there is a finished basement,. and whether there is brick front or not and the type of home: Colonial, Cotemporary or Tudor. How much does each type of home (Colonial, Contemporary, Tudor) add to the price of the real estate
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
This document describes using multiple linear regression to predict real estate prices. House price data from 480 homes sold in Indiana in 2014 is used. Independent variables like size, number of bedrooms/bathrooms, and whether there is a basement are considered. Correlations between variables are examined. An initial regression model is developed using all potential predictors. The best fitting model is found to use only homeowner association (HOA) fees as a predictor, with the equation Price=312638+17.854Hoa.
What Causes Economic Growth? A Breakdown of The Solow Growth ModelJaredBilberry1
The document summarizes an empirical study examining the Solow growth model and the augmented Solow model developed by Mankiw, Romer and Weil. The study uses data from 1960-1985 for non-oil producing countries to test the relationship between GDP per capita in 1985 and variables for investment, population growth, and secondary education. Descriptive statistics show average GDP increased from 1960 to 1985 while population and investment levels also rose. Correlation analysis found GDP correlated positively with investment and education, but negatively with population growth, supporting the models' predictions.
Similar to England's North-South Divide on Home Ownership (20)
Spatial Autocorrelation: Impacts of Employment on Health in NW EnglandCobain Schofield
Using Spatial Autocorrelation, Moran's i, and Geographically Weighted Regression to analyse the impacts of employment on health in North West England, using 2011 Census data.
This document summarizes the process of cleaning and analyzing transportation data from the Quarterly Labour Force Survey to determine the primary modes of transportation used to commute to work across regions in Great Britain. Key steps included removing non-working respondents, simplifying transportation and region categories, and adding an age variable. Cross-tabulations and graphs were produced to show differences in transportation usage between regions and age groups, such as higher car usage in less dense areas and among older commuters. Weighting was applied in some analyses to account for sample sizes.
A rainfall-runoff model for Chew and Kinder Reservoirs, Peak District; utilising the Flood Studies Report to find whether the dams at Chew and Kinder could withstand a 1-in-10,000 year storm (UK recommended safety limit)
Grade: 91%
This document provides an overview of fire histories in North America. It discusses the components and causes of wildfires, including natural causes like lightning and drought, as well as anthropogenic causes like human ignition. It then examines methods for reconstructing historical fire activity, including dendrochronology (tree ring analysis), charcoal records, and historical documentation. Case studies on modern wildfires in California and historical fires in Alaska are also presented. The document concludes that fire regimes have always been influenced by both natural and human factors, and that further interdisciplinary research is needed to better understand pre-colonial fire patterns.
How current debates are influencing the science curriculum in the UKCobain Schofield
This essay seeks to understand how the current issues and debates relating to science education (both primary and secondary levels) are influencing the curriculum.
Grade: 77%
Using Python as a GIS and using datasets of choice, identify 5 specific lower-super-output areas of Liverpool for
investment opportunities.Grade: 83% [moderator's comments are attached to the document]
The Significance of Sequoia sempervirens (Coastal Redwood) Forests: Should th...Cobain Schofield
This document summarizes information about Coast Redwood forests and the history of their protection. It discusses how Coast Redwoods were once widespread but are now limited to northern California coast due to climate change. It describes the ecological importance of Coast Redwood forests and threats they face such as soil erosion, climate change, and development. It also outlines the history of logging Coast Redwoods from the 1800s onward, the conservation movement that arose to protect them, and their current value for tourism.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
1. ENVS450 – Assignment 3 200923027
Page 1
England’s North-South Divide: exploring the impact of socio-demographic
variables on the rate of home-ownership by geographic location
Introduction
The “north-south divide” is a widely debated
social phenomena in the UK, and is often used
to describe the cultural, social, political and
economic differences between the two halves
of the country, with the south generally
determined to be “out-performing” the north.
This has been officially recognised by the
current Conservative government with the
instatement of the Northern Powerhouse
project, which aims to re-balance economic
disparity between the north and south.
However, house prices between the north and
south differ enormously, with the average
house price in the south far-exceeding the
wages of all but the most senior employees,
making the prospect of mortgages completely
unrealistic for most the workforce.
Several factors were identified which would
realistically impact home ownership across the
whole of the UK. These were taken from a list
of census variables and include: rate of
professional employment, rate of households
without a car, rate of residents aged 65 plus,
rate of illness and location within the UK. Given
the nature of the census dataset, each variable
has 348 instances, existing once for each
district in England and Wales. These variables
were then used to determine whether there is in
fact a clear north-south divide in home
ownership in the UK.
Literature Review
Home-ownership in the United Kingdom is
somewhat hegemonic; there is a widely held
cultural expectation and desire to own a home.
This is despite the average house price
increasing by some 35% in the last 10 years to
over £216,000 (The Land Registry, 2017).
Because of this, home ownership has
decreased in the last 30 years as younger
people are “priced out” of the housing market
(Osborne, 2016). The Office for National
Statistics states that in 1991 36% of 16-24 year
olds owned their own home, falling to 9% in
2014. The 35-44 age group has also seen a
drastic fall from 1991 to 2014, from 78%
ownership to 59%. By contrast, home
ownership amongst older age groups has
increased. However, Osborne (2016) found that
overall, the proportion of ownership has fallen
across every part of the UK since the early
2000s and as of publication, England was
seeing the lowest levels of home ownership in
30 years.
Throughout 2016 there were multiple news
articles published highlighting the deepening
north-south divide in the UK as defined by
house prices (Fraser, 2016; Milligan, 2016;
Shaw, 2016; Lynch, 2015). Research
conducted by ‘e-moov’, an online estate agent,
defined the north-south divide based on a “clear
boundary” which snaked across the Midlands
from Bristol to Norfolk. Along this boundary, the
difference in average house prices is as much
as £160,000 between neighbouring counties (e-
moov, 2016). This disparity has led to the claim
that “house prices may permanently diverge
from earnings” causing increasingly
unaffordable houses (Gregoriou, et al., 2014).
However, given the complexity of the national
social demographic, and the additional
complexity of factors affecting home ownership
2. ENVS450 – Assignment 3 200923027
Page 2
rates across the country, the research best
describes the factors as “heterogenic” as they
vary from one part of the country to another
depending on a web of other variables
(Montagnoli & Nagayasu, 2013).
Methodology
A census subset used was taken from the 2011
census – a survey of England and Wales which
determined a resident population of 56.1 million
people (Office for National Statistics, 2011).
The dataset is not raw data, but rather a rate of
variable occurrence within the population of
each of the 348 districts in England and Wales.
An Ordinary Least Squares Regression
analysis was used in alignment with the
standard demographic approach to analyse
only variables that were statistically significant
to the model. The explanatory variables chosen
at the start of the study are as follows:
Rate of professional employment
$Professionals
Rate of households which do not own a
car $No_Cars
Rate of residents Aged 65 or more
$Age_65plus
Rate of Illness $illness
Location within England and Wales
$NorthMidlandsSouth
The statistical significance of these variables
was not known when they were selected, so
some may be subject to dismissal during the
statistical analysis. These variables were
selected based on sparse literature surrounding
factors concerning home ownership
$Owner_occupied (Montagnoli & Nagayasu,
2013), as well as using empirical reasoning.
All the explanatory variables are continuous,
except for $NorthMidlandsSouth which is
categoric. This variable was created by
grouping districts based on their region in the
UK. The categories are: North, Midlands,
South; with Midlands encompassing the area
along the north-south boundary described by
‘e-moov’ which lacks some clarity. It is hoped
that by the end of the analysis, the Midlands
category will identify more with either North or
South, rather than existing as its own unique
region, as this would suggest that there is
indeed a “north-south divide” when it comes to
house prices.
Results
Before the main regression analysis can begin,
it is important to gain some understanding of the
relationship between the outcome variable
$Owner_occupied and the continuous
explanatory variables. Each of the 4 continuous
explanatory variables was plotted against
$Owner_occupied, with a second graph
plotted to assess Skewness. Figure 1 shows
the graphs.
From Figure 1, it is evident that all variables
except $Age_65plus are not normally
distributed. Pearson’s correlation requires
normal distribution, so Spearman’s Rank
correlation coefficient must be used instead to
establish the correlation between the variables.
The rs results from the Spearman’s Rank
calculations are included on each graph. The
results from Figure 1 are perhaps not
surprising, apart from that of
$Professionals, which shows weak
correlation between the explanatory and
outcome variables. However, this is not yet
cause for concern as this analysis does not
3. ENVS450 – Assignment 3 200923027
Page 3
consider spatial distribution of the districts
within each region at this stage.
Figure 1 – the relationship of each explanatory
variable in relation to the outcome variable, and
their Skewness
Next, a Multivariate Linear Regression Model
was fitted to establish the variance between the
variables. The following code was run in R:
> lm(Owner_occupied ~ No_Cars + Pr
ofessionals + illness + Age_65plus
+ NorthMidlandsSouth, data=census)
This model was developed by creating 5
progressively more complex models, each one
incorporating an additional variable from the 5
explanatory variables. Table 1 shows how each
model fared in increasing the model’s statistical
significance using Akaike’s Information
Criterion (AIC). For AIC, the smaller the value,
the more significant the model is.
Table 1 – multiple regression model output of
AIC results for each progressive model
Model Additional Variable AIC
1 $NorthMidlandsSouth 2582.018
2 $Age_65plus 2381.025
3 $No_Cars 1894.315
4 $Professionals 1889.316
5 $illness 1871.359
Table 1 shows the AIC reducing with each
additional variable, but the reduction in AIC gets
smaller and smaller, particularly between
models 3 and 4 where there is only a 5 point
reduction in AIC. However, the reduction still
contributes to the understanding and outcome
of the model, even if it does increase the
complexity by 20%. Therefore, model 5 will
become the model used for this study.
From this model, the coefficients can be
examined to write the fitted model in readable
terms:
> coefficients(model)
The output of the above code is displayed in
Table 2, the data from which was used to write
the fitted model:
% Home Owners = 67.37 – 0.82 + 0.37 +
0.64 + 0.13 – 1.17 + 1.94
The R2
of the model can also be obtained:
> summary(model)$r.squared
[1] 0.8767051
The R2
value of 0.87 suggests that the model
has a good fit. However, such a high R2
value
4. ENVS450 – Assignment 3 200923027
Page 4
does not necessarily mean that the fit is good.
The residuals must be considered to check how
the data is distributed about the horizon.
> plot(resid(model))
This outputs the graph in Figure 2, which shows
that the residuals are fairly evenly distributed
throughout the plot, suggesting that there is in
fact a good fit within this model, and that the R2
value of 0.87 can be respected.
Table 2 – coefficients of the fitted linear
regression model of all predictor variables
Variable Coefficient
(Intercept) 67.37
$No_Cars -0.82
$Professionals 0.37
$illness 0.64
$Age_65plus 0.13
$NorthMidlandsSouth[S] -1.17
$NorthMidlandsSouth[N] 1.94
This now means that the fitted model is able to
explain 87% of the spatial variation in home
ownership, based on the explanatory variables.
With only one explanatory variable
($NorthMidlandsSouth) used, the model
can only explain 2.7% of the spatial variation in
home ownership, meaning that the remaining 4
explanatory variables increase the accuracy of
the model by over 84%.
> summary(lm(Owner_occupied ~ Nort
hMidlandsSouth, data=census))$r.sq
uared
[1] 0.02767543
> 0.027*100
[1] 2.7
The model can also be checked using AIC
which eliminates the issue of the R2
value
automatically increasing for each variable
added to the model.
> AIC(model)
[1] 1871.359
> AIC(lm(Owner_occupied ~ NorthMid
landsSouth, data = census))
[1] 2582.018
This shows that increasing the complexity of the
model reduces the AIC score from 2582 to 1871
meaning that the additional complexity is
statistically significant and worthwhile.
Figure 2 – a residual plot of the fitted model
with R2
value of 0.87
To validate the model thus far, the model
residuals were checked to ensure normal
distribution. Figure 3 shows the model residuals
plotted as a graph, displaying very slight
positive skewness. This can be calculated:
> skew(model$residuals)
[1] 0.04953776
The skewness value for this model is calculated
as 0.04, which is so slight that the distribution is
essentially a symmetric distribution.
A ‘QQ’ plot was also generated to check the
skewness of the model. Rodríguez (2016)
states that a QQ plot showing curvature would
indicate skew distributions. The QQ plot
generated for this model is shown in Figure 4.
The plot is slightly curved at each end, but
5. ENVS450 – Assignment 3 200923027
Page 5
broadly follows a straight line across the
majority of the points within the dataset
suggesting only minimal skew. Given the equal
mirrored curvature at each end of the graph in
Figure 4, this largely cancels, resulting in only a
slight overall positive skew, which is what the
graph in Figure 3 and the skew calculations
previously discussed indicate. This gives
reasonable confidence to move on to the next
stage in validating the model which is to check
for constancy in error variance.
Figure 3 – model residuals plotted to show the
skewness of the model. The model is
symmetrically distributed
Figure 4 – a QQ plot of the model showing
minimal positive overall skew
The constancy was checked using a ‘spread-
level’ plot which is displayed in Figure 5.
Figure 5 shows a near-horizontal line of best fit
and no clear curvature in the scatter plot (the
few points in the bottom right are not significant
compared to the bulk of points above), together
these two properties show a constant error
variance.
Figure 5 – a spread-level plot of the model
residuals showing a near-horizontal line of best
fit suggesting constant error variance
Next, the multicollinearity of the model is tested.
This checks whether the explanatory variables
used in the model are strongly correlated in
combination. To check the multicollinearity of
the model, the following code was run:
> sqrt(mean(car:::vif(model)))
[1] 1.417898
This value is well within the safe range
described by Kabacoff (2015), who describes
sqrt(VIF) values greater than 2.0 as concerning.
Next, the Ordinary Least Squares Regression
assumes that the relationship between each of
the explanatory variables and the outcome
variable are linear. A partial residual plot was
generated in Figure 6.
6. ENVS450 – Assignment 3 200923027
Page 6
Figure 6 – partial residual plots of the model
With the exception of the $Professionals
plot, each plot appears linear, accounting for
limited noise such as that in $Age_65plus.
However, on closer inspection the large
deviation from linearity in $Professionals is
caused by a single outlier (City of London
district) which has a much higher than average
rate of professional workers. This is hardly
surprising, and as it is only a single point, the
red line of best fit follows the expected trajectory
of the green line to the point of deviation. It can
therefore be said that there is no obvious
departure from linearity from any of the
explanatory variables in the model.
Given all of the checks made, the model
appears to be robust and statistically sound.
The model output was then used to conduct a
multivariate principle components analysis in
PAST, shown in Figure 7 and larger in Appendix 3
Figure 7 – a multivariate principle component analysis of the model, conducted in PAST
Figure 7 shows each of the 348 districts within
the census dataset plotted according to their
residuals on axis 4 and 5 ($Owner_occupied
and $Professionals) of Model 5. It shows
7. ENVS450 – Assignment 3 200923027
Page 7
how the districts are seated in relation to each
other and the variables of the model, colour-
coded by the region of the UK to which they
belong, as defined in the model design
($NorthMidlandsSouth).
It is clear from Figure 7 that there is a lot of
overlap amongst the districts from each region,
which is to be expected. The interesting regions
are those which extend away from the central
cluster, as these are the ones that move away
from the “average” and begin to define the wider
region.
Focussing first on the North, the districts are
pulled toward the right and down, suggesting a
greater influence from $illness and
$No_Cars than the other regions. Figure 8A
shows that the North does indeed have the
highest average rate of no-car ownership.
However, Figure 8B shows that the Midlands
has a higher rate of illness, with a peculiar
positive linear relationship between $illness
and home ownership in the South which defies
empirical thought and goes against the
negative relationship of the North and Midlands.
The North also experiences a fairly significant
pull upwards by $Age_65plus, which is aligns
with Figure 8D which shows that the North has
2nd
highest rate of people aged over 65, though
in Figure 7 a lot of those districts influenced by
a high rate of older people also see strong
influences from professional workers.
Next, the South appears to have a much
broader spread than the other two regions, but
with 184 districts, it is exactly twice the size of
the North (largely owing to the higher
population and therefore larger number of
districts). Figure 7 shows that the South has a
large cluster positioned toward the left of the
graph, which appears to traverse the length of
Figure 8 – separate regression models for each
explanatory variable against the outcome var.
A
B
C
D
8. ENVS450 – Assignment 3 200923027
Page 8
the Y-axis, suggesting strong influences from
an aging population, home ownership and no-
car ownership, with little impact from the rate of
illness and rate of professionals in each district.
Figure 7 is backed up by Figure 8D which
shows that the South has the largest range in
the variable $Age_65plus of any of the
explanatory variables. Figure 8A shows that the
rate of no-car ownership has the largest impact
on the South, while Figure 8C shows – perhaps
counter-intuitively – that increased rate of
professional employees within a district leads to
lower rates of home ownership. However, this
is most-likely a result of people living and
working within London districts, where house
prices are high and workers may rent
accommodation as they may be expecting to
move in line with work commitments.
Nevertheless, 8C supports Figure 7’s apparent
lack of influence from $Professionals.
Finally, Figure 7 shows that there is a pull to the
right side of the graph with the Midlands region.
This suggests high rates of illness and high
rates of professionals in the workforce. Figures
8B & 8C support this, showing that the Midlands
has the highest average rates of both illness
and professional workforce.
Conclusion
This study has found a number of key points:
The final model (Model 5 in Table 1) was
statistically significant and the model
validation steps show this.
There are great disparities between the
North and South across all variables
except $No_Cars, where each of the 3
regions had very well correlated plots,
such as that in Figure 8A.
The Midlands often aligns very closely with
the North, as Figure 8 shows well. Figure
7 displays a lot of overlap between the
Midlands and the North – much more so
than any region does with the South.
Home ownership in the South is impacted
differently to what the literature suggests,
and what may be reasonably expected –
for example: higher rate of professionals in
a district = lower rate of home ownership.
This indicates that the South is subject to
different pressures and factors than the
North when it comes to home ownership.
Given that the Midlands aligns so closely with
the North rather than the South, it would make
sense to group the two regions together, as the
report by ‘e-moov’ did. This would not only
serve to increase the statistical significance of
the North (by giving this region a similar number
of districts to the South), but it would also make
logical sense given that this study has proven
the districts classified as “Midlands” to actually
be statistically similar to those classified as
“North”.
What this essentially means is that there is
indeed a north-south property divide in England
and Wales. The boundary is clear: from Bristol
in the west, across Warwickshire and
Gloucestershire, across to Leicestershire and
Norfolk in the east – as stated by ‘e-moov’ in
their 2016 report.
Obviously, this study did not look at the spatial
economics of homeownership in a statistical
sense, though empirically it is understood and
respected that property costs more in the
South. This study instead focused on just a few
social variables from the 2011 census in order
to draw this conclusion. A future study would
benefit from greater complexity in the modelling
9. ENVS450 – Assignment 3 200923027
Page 9
(increased number of social variables) as well
as incorporating some economic factors to
paint a much clearer picture of the wider issues
that we as a nation face when it comes to home
ownership and house prices.
References
e-moov, 2016. The North-South Property
Divide Defined, Brentwood: e-moov.
Fraser, I., 2016. This map shows just how stark
the north-south property divide is. The
Telegraph, 30 November.
Gregoriou, A., Kontonikas, A. & Montagnoll, A.,
2014. Aggregate and regional house price to
earnings ratio dynamics in the UK. Urban
Studies, 51(13), pp. 2916-2927.
Kabacoff, R., 2015. R in Action: Data Analysis
and Graphics with R. 2nd ed. Greenwich, CT:
Manning.
Land Registry, 2017. House Price Index for
United Kingdom; January 2006 to January
2016. [Online]
Available at:
http://landregistry.data.gov.uk/app/ukhpi/explor
e
[Accessed 1 January 2017].
Lynch, R., 2015. North-South divide in house
prices is highest ever. The Independent, 30
December.
Milligan, B., 2016. North-South house price
divide hits record high. BBC News Business, 1
April.
Montagnoli, A. & Nagayasu, J., 2013. An
Investigation of Housing Affordability in the UK
Regions, Glasgow: Scottish Institute for
Research in Economics.
Office for National Statistics, 2011. 2011
Census: Key Statistics for England and Wales,
March 2011. [Online]
Available at:
https://www.ons.gov.uk/peoplepopulationandc
ommunity/populationandmigration/populatione
stimates/bulletins/2011censuskeystatisticsfore
nglandandwales/2012-12-11#key-points
[Accessed 26 December 2016].
Office for National Statistics, 2016. UK
Perspectives 2016: Housing and
homeownership in the UK. [Online]
Available at: http://visual.ons.gov.uk/uk-
perspectives-2016-housing-and-home-
ownership-in-the-uk/
[Accessed 25 December 2016].
Osborne, H., 2016. Home ownership in
England at lowest level in 30 years as housing
crisis grows. The Guardian, 2 August.
Rodríguez, G., 2016. Generalized Linear
Models. [Online]
Available at:
http://data.princeton.edu/wws509/notes/c2s9.h
tml
[Accessed 1 January 2017].
Shaw, V., 2016. Buyers vs sellers – the new
north-south divide on house prices. Mirror, 17
October.
10. ENVS450 – Assignment 3 200923027
Page 10
Appendix 1 :: R Code
###########################################################
#### Assignment 3 ####
### "England’s North-South Divide: exploring the impact of socio-
demographic variables on the rate of home-ownership by geographic location"
###
###########################################################
## Load Libraries & Data ##
source("functions.R")
load(file = "2011 Census.RData")
load(file = "QLFS.RData")
library(plyr)
load.package("mosaic")
load.package("reshape2")
load.package("ggplot2")
load.package("car")
load.package("scales")
load.package("MASS")
load.package("pls")
###########################################################
## Dataset to be 2011 Census ##
## Output variable chosen to be $Owner_occupied ##
## Explanatory vars to be $Professionals, $Age_65plus, $No_Cars and
$illness ##
## (explanatory vars chosen from literature and logic) ##
# Insert the vars into an array for later use #
explan.vars <- c("Professionals","Age_65plus","illness","No_Cars")
###########################################################
### Explore the relationship between outcome and explan. vars ###
## Function to generate a scatter plot with best fit line ##
generateXbyY <- function(inputX, inputY){
return(ggplot(data=census) +
geom_point(aes_string(x="Owner_occupied", y=inputY) ) +
geom_smooth(method = "lm",fullrange=TRUE,
aes_string(x="Owner_occupied", y=inputY)) +
theme_bw() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank()))
}
11. ENVS450 – Assignment 3 200923027
Page 11
# Output a scatter for each explan. var #
for (i in explan.vars) {
generateXbyY("Owner_occupied",i)
}
## Function to generate skew plots for explan. vars ##
generateSkew <- function(inputY) {
return(qplot(inputY, data=census, geom="histogram", binwidth=1))
}
# Output a skew graph for each explan. var #
for (i in explan.vars) {
generateSkew(i)
}
###########################################################
## Up to now, all vars appear to be suitable, though the ##
## Skewness indicates that Spearman's Rank must be used ##
## in place of Pearson's correlation coefficient ##
###########################################################
### Build the model; assess statistical significance ###
## Start with 1 var ($NorthMidlandsSouth - spatial) ##
## and then add consecutive explan. vars ##
model1 <- lm(Owner_occupied ~ NorthMidlandsSouth, data=census)
model2 <- lm(Owner_occupied ~ NorthMidlandsSouth + Age_65plus, data=census)
model3 <- lm(Owner_occupied ~ NorthMidlandsSouth + Age_65plus + No_Cars,
data=census)
model4 <- lm(Owner_occupied ~ NorthMidlandsSouth + Age_65plus + No_Cars +
Professionals, data=census)
model5 <- lm(Owner_occupied ~ NorthMidlandsSouth + Age_65plus + No_Cars +
Professionals + illness, data=census)
# Check whether there is stat. sig. between each model #
anova(model1,model2,model3,model4,model5)
# Check that AIC is reducing from one model to the next #
AIC (model1,model2,model3,model4,model5)
###########################################################
## Everything looks good, so adopt the most complex model ##
## as --the-- model for the study ##
model <- model5
###########################################################
### Begin the validation of the model - check it is ###
### actually statistically significant ###
## Output the coefficents and obtain R-sq. value ##
coefficients(model)
summary(model)$r.squared
#-- R-sq. = 0.87 #
12. ENVS450 – Assignment 3 200923027
Page 12
## Don't take R-sq. at face value - check residuals are ##
## distributed evenly! ##
plot(resid(model))
abline(0,0)
#-- Residuals appear evenly distributed about the horizon #
#-- all seems good so far #
summary(model)$r.squared*100
#-- 87.67 #
#-- this means that >87% of the variation is explained by the model #
summary(model1)$r.squared*100
#-- 2.76 #
#-- model1() only explains 2.67% of variation - model() is much better! #
AIC(model)
#-- 1871.359 #
AIC(model1)
#-- 2582.018 #
#-- This shows a great reduction from model1() to model() meaning #
#-- that additional model complexity = greater stat. sig. #
## Next, the skewness of the model can be checked. Given that the checks ##
## made up to now indicate a good model, skewness should be limited at most
##
skew(model$residuals)
#-- 0.049.. - basically negligible; points to symmetric distribution #
## Next generate a QQ-plot to check skewness ##
#-- expect to find good fit to line given skew value of 0.04 #
p2<-ggplot(model, aes(qqnorm(.stdresid)[[1]], .stdresid))+geom_point(na.rm
= TRUE)
p2<-p2+geom_abline(aes(qqline(.stdresid)))+xlab("Theoretical
Quantiles")+ylab("Standardized Residuals")
p2<-p2+ggtitle("Normal Q-Q")+theme_bw()
p2
#-- Indeed, all but a few points appear to follow the line #
## Spread-level plot to show residual fit when studentized ##
car:::spreadLevelPlot(model)
#-- the near-horizontal line of best fit is good as it shows a good
linearity #
#-- also the lack of curvature in the scatter indicates good distribution
of residuals #
## Multicollinearity ##
# Check whether the explan. vars are strongly correlated in combination #
sqrt(mean(car:::vif(model)))
#-- 1.41 ... this is good according to Kabacoff(2015 -- see References) #
#-- Kabacoff says >2.0 is concerning #
## OLS expects linear relationship of explan. vs. outcome ##
# Plot partial residual plots to check this #
car:::crPlots(model)
#-- good, as no obvious departure from linearity in any explan. var #
13. ENVS450 – Assignment 3 200923027
Page 13
###########################################################
## Model appears good; export data to PAST and plot fixed slopes ##
## to check each var in relation to each region simultaneously ##
###########################################################
## Function to generate fixed slope scatters of all regions for any explan.
var ##
generateFixedSlope <- function(inputX) {
ggplot(data=census) +
geom_point( aes_string(x=inputX, y="Owner_occupied",
colour="NorthMidlandsSouth") ) +
geom_smooth(method = "lm", se = FALSE, aes_string(x=inputX,
y="Owner_occupied", colour="NorthMidlandsSouth")) +
theme_bw() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 20) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
}
# Output a fixed slope for each explan. var #
for (i in explan.vars) {
generateFixedSlope(i)
}
###########################################################