1
BBS300 Empirical Research Methods for Business
TSA, 2018
Assignment 1
Due: Sunday, 7 October 2018,
23:55 PM
This assignment covers material from Sessions 1-4 and is worth 20% of your total mark
of BBS300. Your solutions should be properly presented, and it is important that you
double-check your spelling and grammar and thoroughly proofread your assignment
before submitting. Instructions for assignment submission are presented in
the “Assignment 1” link and must be strictly adhered to. No marks will be
awarded to assignments that are submitted after the due date and time.
All analyses must be carried out using SPSS, and no marks will be awarded
for assignment questions where SPSS output supporting your answer is not
provided in your Microsoft Word file submitted for the Assignment.
Questions
In this assignment, we will examine the “Real Estate Market” dataset (described at the
end of the assignment ) and “Employee Satisfaction” dataset. Before beginning the
assignment, read through the descriptions of these dataset and their variables carefully.
The “Real Estate Market” dataset can be found in the file “realestatemarket.sav,” and
the “Employee Satisfaction” dataset can be found in the file “employeesatisfaction.sav.”
You will need to carefully inspect both SPSS data files to be sure that the
specification of variable types is correct and, where appropriate, value
labels are entered.
1. (12 marks)
2
Use appropriate graphical displays and measures of centrality and dispersion
to summarise the following four variables in the “Real Estate Market” dataset. For
graphical displays for numeric data, be sure to comment on not only the shape of
the distribution but also compliance with a normal distribution. Be sure to
include relevant SPSS output (graphs, tables) to support your answers.
(a) Price.
(b) Lot Size.
(c) Material.
(d) Condition.
2. (8 marks)
Again consider the variable Price, which records the property price (in AUD). It
is of interest to know if this is associated with the distance of the property is
located to the train station. It i s al so of i nter e st t o kn o w if th e p rop ert y
pri ce s are a sso ciate d with di st an ce to t h e ne ar e st b u s sto p. Carry out
appropriate statistical techniques to assess whether there is a significant
association between the property price and distance to the nearest train (To train)
station and the nearest bus stop (To bus). Be sure to thoroughly assess the
assumptions of your particular analysis, and be sure to include relevant SPSS
output (graphs, tables) to support your answers.
3. (7 marks)
Consider the “Employee Satisfaction” dataset, which asked participants to provide their
level of regularity to a series of thirteen statements. Conduct an appropriate analysis
to assess the reliability of responses to these statements. If the reliability will
increa.
1 BBS300 Empirical Research Methods for Business .docx
1. 1
BBS300 Empirical Research Methods for Business
TSA, 2018
Assignment 1
Due: Sunday, 7 October 2018,
23:55 PM
This assignment covers material from Sessions 1-4 and is worth
20% of your total mark
of BBS300. Your solutions should be properly presented, and it
is important that you
double-check your spelling and grammar and thoroughly
proofread your assignment
before submitting. Instructions for assignment submission are
presented in
the “Assignment 1” link and must be strictly adhered to. No
marks will be
awarded to assignments that are submitted after the due date
and time.
2. All analyses must be carried out using SPSS, and no marks
will be awarded
for assignment questions where SPSS output supporting your
answer is not
provided in your Microsoft Word file submitted for the
Assignment.
Questions
In this assignment, we will examine the “Real Estate Market”
dataset (described at the
end of the assignment ) and “Employee Satisfaction” dataset.
Before beginning the
assignment, read through the descriptions of these dataset and
their variables carefully.
The “Real Estate Market” dataset can be found in the file
“realestatemarket.sav,” and
the “Employee Satisfaction” dataset can be found in the file
“employeesatisfaction.sav.”
You will need to carefully inspect both SPSS data files to be
sure that the
specification of variable types is correct and, where appropriate,
value
3. labels are entered.
1. (12 marks)
2
Use appropriate graphical displays and measures of centrality
and dispersion
to summarise the following four variables in the “Real Estate
Market” dataset. For
graphical displays for numeric data, be sure to comment on not
only the shape of
the distribution but also compliance with a normal distribution.
Be sure to
include relevant SPSS output (graphs, tables) to support your
answers.
(a) Price.
(b) Lot Size.
(c) Material.
(d) Condition.
4. 2. (8 marks)
Again consider the variable Price, which records the property
price (in AUD). It
is of interest to know if this is associated with the distance of
the property is
located to the train station. It i s al so of i nter e st t o kn o w if
th e p rop ert y
pri ce s are a sso ciate d with di st an ce to t h e ne ar e st b u s
sto p. Carry out
appropriate statistical techniques to assess whether there is a
significant
association between the property price and distance to the
nearest train (To train)
station and the nearest bus stop (To bus). Be sure to thoroughly
assess the
assumptions of your particular analysis, and be sure to include
relevant SPSS
output (graphs, tables) to support your answers.
3. (7 marks)
Consider the “Employee Satisfaction” dataset, which asked
participants to provide their
level of regularity to a series of thirteen statements. Conduct an
5. appropriate analysis
to assess the reliability of responses to these statements. If the
reliability will
increase by eliminating one or more variables, report which
variable(s) this is/are.
Again, be sure to include relevant SPSS output (graphs, tables)
to support your
answers.
4. (3 marks)
Presentation marks. These marks are allocated based on:
• structure, clarity, and tidiness of presented solutions/answers;
and
• correctness in spelling and grammar.
3
Melbourne Property Prices
The dataset realestatemarket.sav contains data on 120 properties
6. listed on the database
of a real estate agent in an Australian suberb. The table b e l o
w p r o v i d e s a list of
variables contained in the dataset.
Variable Descriptions
Price Selling price of house in $'000
Rooms Number of main rooms in the house
Lot Size Area of the block of land (lot) in square metres
Age Age of the house in years
Area Area of the house in square metres
Material Timber = 1, Veneer = 2, Brick = 3
To Train Distance of the house to the nearest train station
(kilometres)
To Bus Distance of the house to the nearest bus stop
(kilometres)
To Shops Distance of the house to the nearest shopping centre
(kilometres)
Street Street appeal as evaluated by the real estate agency:
ranges from 0 (lowest appeal) to 10 (highest appeal)
Storeys Number of storeys or levels in the house
7. Style Traditional Style = 0, Non-Traditional Style = 1
Bedrooms Number of bedrooms
Bathrooms Number of bathrooms
Kitchen Style of kitchen: Adequate = 0, Modern = 1
Heating Central or other heating system installed: No Heat = 0,
Yes Heat = 1
AirCon Air conditioning installed: No AC (No AirCon) = 0, AC
(Yes AirCon) = 1
Bay Views Proportion of views of the Bay from a prominent
part of the property:
ranges from 0 = Nil views up to 1 = Full views
Suburb Three different suburbs: 1 = Suburb A, 2 = Suburb B, 3
= Suburb C
Weekly Rent $ Actual or estimated weekly rent in $.
Rental Return % Annual rate of return from rent income
(Weekly rent x 52)/(Price in $'000) as a percentage (%)
Condition The condition of the house in general. Very Poor = 1,
Poor = 2, Good = 3, Excellent = 4
Rental Status Vacant (available for rent) = 1; Rented (currently
rented) = 2; Owner (occupied by owner) = 3
Table 1: Descriptions of variables contained in the dataset
8. Realestatemarket.sav
4
Empirical Research Methods for Business
Q1a) Price
There are a total of 120 samples, of which, 100% of the
observations are regarded by SPSS.
Based on the descriptive statistics, the mean price of a house in
AUD (In $‘000s) is 886.575, with a 29.6344 standard error from
9. the mean of the population. The confidence interval of 95% of
the mean is between the upper bound of 945.3116 and the lower
bound of 827.8384.
The trimmed mean value is 876.7778, which is very close the
mean value of 886.5750. This shows that there is a low
influence of the extreme values. The median price of a house is
852.000, which means that the distribution score is skewed
slightly towards the lower end of the distribution, with a value
of 0.426 as shown in the figure on the left. The minimum value
of a house in the population is 192.00 and the maximum value
of a house in the population is 1761.00. The value of the
standard deviation is 324.94666, which is rather high in this
context and means that the prices of the houses are spread out
over a wide range of values.
The box plot indicates that the distribution is skewed towers the
left of the distribution graph since the box is closer towards the
lower end of the graph. In this case, there is also an individual
large outlier. Which is indicated at the top of the line.
As for the Normality test, the significance of Kolmogorov-
Smirnov is indicated at 0.200 and the significance of Shapiro-
Wilk is at 0.116. Since both of these values are above 0.05, the
data can be regarded as being normally distributed.
The histogram also shows that it is asymmetric and is skewed
more towards the left and is not normally distributed.
10. Finally, the observed value of each house plotted against the
expected value of the normal distribution does not show a
reasonably straight light, which is evident that it is not a normal
distribution.
1b) Lot Size
As for the Lot Size for the housese. There are a total of 120
samples, and 100% of the observations are regarded by SPSS.
The descriptive statistics for the Lot size suggests that the mean
size of a house in the data set is 1175.2250, With a 34.041
standard error from the mean of the population. The confidence
interval of 95% of the mean is between the upper bound of
1242.6296 and the lower bound of 1107.8204.
The trimmed mean value is 1161.3796, which is very close the
mean value of 1174.2250. This indicates that there is a low
level of influence of the extreme values. The median size of a
house is 980.00, which means that the distribution score is
skewed towards the lower end of the distribution, with a value
of 0.838 as shown in the figure on the left. The minimum size of
a house in the dataset is 632.00 and the maximum size of a
house in the population is 1950.00. The value of the standard
deviation is 372.90043, which is rather high in this context and
means that the sizes of the houses are varies over a wide range
of sizes.
The box plot indicates that the distribution is skewed towers the
11. left of the distribution graph since the box is closer towards the
lower end of the graph. In this case, there are no outliers.
As for the Normality test, the significance of Kolmogorov-
Smirnov is indicated at 0.000 and the significance of Shapiro-
Wilk is at 0.000. Since both of these values are below 0.05, the
data can be regarded as not normally distributed
The histogram also shows that it is asymmetric and is skewed
more towards the left as left side of the graph indicates
extremities at the left. Therefore, it is not normally distributed.
The QQ plot also shows a large number datasets falling out of
line when plotted against the expected value of the normal
distribution. Since the line is not straight, it goes to show that it
is not a normal distribution.
Q1c) Material
Material
13. population. 36 houses use Veneer, which is 30% of the
population. As for the rest of the 39 house, Brick is the material
used. Which is 32.5%.
Condition
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
Very Poor
15
12.5
12.5
12.5
Poor
40
33.3
33.3
45.8
Good
42
35.0
35.0
80.8
Excellent
23
19.2
14. 19.2
100.0
Total
120
100.0
100.0
Q1d) Condition
The condition of the houses is considered ordinal and the data
from the data set cannot be measured. But what we can gather
from the descriptive stats is that 15 houses are in poor
condition, which makes up 12.5% of the population of 120
houses. 40 houses are in poor condition, which makes up 33.3%
of the population. The majority of the houses are in good
condition, with a total of 42 of them, making up 35% of the
population. 23 houses are in excellent condition, which is
19.2% of the total population.
Q2)
Results from the normality test for distance to train, shows that
the value of the significance of Kolmogorov-Smirnov and the
value of the significance Shapiro-Wilk are 0.01 and 0.002
respectively. This indicates that it is not normally distributed
since both values are lesser than 0.05.
The Histogram for distance to Train also appears to be
asymmetric since a large portion of the values goes to high
extremities. Therefore, it goes to show that it is not normally
distributed.
15. Correlations
Price
To Train
Price
Pearson Correlation
1
.003
Sig. (2-tailed)
.974
N
120
120
To Train
Pearson Correlation
.003
1
Sig. (2-tailed)
.974
N
120
120
The results from the Pearson Correlation also indicates that
there is very weak relationship between the distance to the train
station and the price of the houses.
16. Therefore, the distance to the train station barely affects the
price of the houses.
Using the scatter plot to further test the hypothesis, it is evident
that there is a weak linear relationship between the prices of the
house and the distance of the house to the train station.
As for the Kolmogorov-Smirnov and Shapiro-Wilk normality
test for distance to bus station, it is evident that it is not
normally distributed since both results show that their
significance are zero.
The Histogram for distance to bus station also appears to be
asymmetric since a there are large extremities at both ends of
the graph. Therefore, it is not normally distributed.
Correlations
Price
ToBus
Price
Pearson Correlation
1
-.024
17. Sig. (2-tailed)
.796
N
120
120
ToBus
Pearson Correlation
-.024
1
Sig. (2-tailed)
.796
N
120
120
As for the Pearson Correlation results between the price of the
house to the distance to the bus station, it shows a very weak
negative relationship at only negative 0.024.
The Scatter Plot results of the prices of the houses and the
distance to the bus stations shows that the data points are spread
unevenly on the diagram with no patterns. Therefore, there is
little correlation or a weak Linear relationship.
19. The Cronbach’s Alpha from the reliability statistic shows a
result of 0.923. Since this value is above 0.9, it is considered to
be have an excellent level of internal consistency of scale.
However, the reliability of the statistic can be further increased
by eliminating one or more variables.
Since Q8 is the highest value at 0.928, deleting this variable
will further increase the Cronbach’s Alpha value.
Deleting off Q8 results in the Cronbach’s Alpha increasing to
0.926 from 0.923. Although it is considered to be have an
excellent level of internal consistency of scale, the value can be
further increased by deleting off another variable.
Judging from the dataset, Q11 would be the highest value at
9.31. Deleting off this variable will further increase the
Cronbach’s Alpha value.
20. After deleting both variables, Q8 and Q11, the Cronbach’s
Alpha value has now increase to 0.931, from 0.926.
The data set suggests that the next variable to delete off would
be Q9. However, since the Cronbach’s Alpha is already above
0.9, it is considered to be have an excellent level of internal
consistency of scale and deleting off a variable to further
increase the Cronbach’s Alpha value is not necessary.