SlideShare a Scribd company logo
TaskA:InvestigatingJob Vacancyand UnemploymentRateData
A1. Investigating the Population Data
Have a look at the resident population data. You will see many columns. We are
interested only in the total values for each state (marked "Persons"), so you can drop
the other columns and rename the columns for each state if you wish.
(HINT: The file isn't very big so you can make the changes in Excel if you want.)
1. In Python (or R) plot the population of Victoria, New South Wales and Queensland over time.
(HINT: You don't need to put the dates on the x-axis, just showing the index of each quarter is
fine)
a) Are the population values increasing or decreasing over time?
b) Does the population data exhibit a trend and if so, what type?
Answer: The below relation is obtained while tracing the count of the population for the three
states viz Victoria, New South Wales and Queensland over the time.
As the graphs are plotted it is evident that the count of the population is gradually increasing for the
three states over the time. Queensland has the least population among the three states while New
South Wales has the maximum population. The trend is linearly increasing one with a positive slope
over the time.
2. Fit a linear regression using Python (or R) to the Victorian population data and plot the linear fit.
(HINT: In Python, you can use the "range (1, n)" function to generate a sequence of integer
values: 1, 2..., n)
a) Does the linear fit look good?
b) Use the linear fit to predict the resident population in Victoria for the dates: 1/9/15,
1/12/15, 1/12/16, and 1/12/17.
Answer: The values of the Victorian population is first scattered plotted and then linear regression is
applied on the data for best fit line. The linear fit looks definitely good. The graph is as follows:
The predicted population for the given dates are as below:
A2. Investigating the Job Vacancies Data
Now have a look at the job vacancies data.
1. Use Python (or R) to plot the job vacancy counts for Victoria over time. (HINT: Pandas contains
a "transpose ()" method and Excel can also be used to transpose data.)
a) What are maximum and minimum values for job vacancies in Victoria over time
period?
Date Population
1/9/15 5739516.54838
1/12/15 5979953.5504
1/12/16 6076128.35121
1/12/17 6172303.15202
Answer: The vacancy count of Victoria is plotted over time. The graph is as follows:
The maximum and the minimum values of the population are 71971 and 32322 respectively.
2. Fit a linear regression to the data and plot it.
a) Does it look like a good fit to you? Would you believe the predictions of the linear model
going forward?
b) Instead of fitting the linear regression to all of the data, try fitting it to just the most recent
data points (say from the 85th data point onwards). How is the fit? Which model would
give better predictions of future vacancies do you think?
Answer: Firstly, the linear regression is implemented on the total Victorian population data. Then
the linear regression is implemented on the 85th data onwards. The below graphs are obtained.
The line is definitely not a good fit. The data is arranged as a function of polynomial equation rather
than a linear one. In this case a linear fit line will not be able to provide correct estimations of the
data. Hence, the linear model based on all the data is not plausible for any prediction.
Choosing the data from the 85th row onwards provides a linear arrangement of data. In this
scenario, a linear fit line is desirable. As per the plotted graph, it can be seen that the line fits very
close to all the data linearly. Hence to predict a data WITHIN the time interval [85th Row] to [130th
Row], the second model suits the best.
However, to predict the FUTURE data, none of the above models fits best as it is evident from the
history value, the data shows linear trend (both positive and negative slopes) at certain intervals
only. It might be the case that the interval from the 131th row onwards shows a linear trend with a
downward slope. In this case, the second model fails as well, to predict the data correctly. Here,
regression using a polynomial model definitely holds an upper hand than the linear model.
A3. Investigating the Unemployment Data
Now have a look at the unemployment data.
1. Use Python (or R) to plot the Unemployment Rate for Victoria over time.
a) It looks like the rate has been very high at times in the past. What was the maximum
unemployment rate in Victoria recorded in the dataset and when did that occur?
Answer: Next Page (Contd.)
The maximum unemployment rate was: 12.5533377 during the year 1993 in the month of August.
A4. Visualising the Relationship between
Unemployment and Job Vacancies
Now let's look at the relationship between unemployment levels and job vacancies.
1. Python (or R) to combine the data from the different files into a single table. The table
should contain population values, job vacancy counts and unemployment rates for the different
dates and different States/Territories.
a) What is the first date and last date for the combined data?’
Answer: The first date and the last date for the combined data are as below:
2. Now that you have the data aggregated, we can see whether there is a relationship between
unemployment and the number of job vacancies. Plot the values against each other.
a) Can you see a relationship there?
Answer: The merged data is now used to plot the unemployment and the vacancy of all the states.
A scatter plot has been used instead of a line plot as the graph generated from the scatter plot is
more legible in this case. The graph is as below:
Argument Value
Min Date 2015/03/01
Max Date 2015/06/01
The above picture shows that the vacancies are quite high when the unemployment rate is between
4 and 6. However the graph fails to produce any meaningful insight. This can be due to the fact that
the plotted data contains vacancy rate and unemployment rate of all the States for all quarters in an
unstructured way, without any correlation among them.
An approach to deduce a more meaningful relation between unemployment rate and vacancies wou
ld be to group the cumulative values (for all states) based on each quarter. On plotting the data, it pr
oduces the following graph:
This graph clearly shows that the Vacancy and Unemployment has an inverse relation. As the Vacanc
y increases gradually the unemployment decreases. This is in accordance to the real-life scenario.
3. Try selecting and plotting only the data from Victoria.
a) Can you see a relationship now? If so, what relationship is there?
Answer: Unlike the previous graph to establish relationship for all states, in this case, the
unemployment and the vacancy data is plotted against the state of Victoria only. The below graph is
obtained.
The graph correlates to the previous finding of grouped data. Here the Vacancy for the state of
Victoria is gradually decreasing as the Unemployment Rate grows. Noteworthy, the vacancies for the
state of Victoria are quite high and seemingly unaffected until the unemployment rate reaches the
value of 5.
4. The different populations across the states will influence the number of job vacancies in each.
Remove this effect by introducing a new column called 'Vacancy Rate' which contains the
vacancy count divided by the population size, multiplied by 100.
a) Is there a relationship between the unemployment rate and the job vacancy rate across all
the data?
Answer: The column is added to the source data. Now, the vacancy rate and the unemployment
rate are plotted for both type of data (Grouped and Ungrouped).
Next Page (Contd.)
Both the above methodology suggests that the Vacancy rate is inversely related to the
Unemployment Rate. The Vacancy Rate has clearly shaped the trend in to a more linearly degrading
form by omitting the effect of population count.
Mention worthy, in all the above cases the vacancies are not impacted by the unemployment rate
until it reaches a certain threshold unemployment rate of around 4.5
A5. Visualising the Relationship over Time
Now let's look at the relationship between unemployment levels and job vacancies
over time.
1. Use Python (or R) to build a Motion Chart comparing the job vacancy rate, the unemployment
rate, and the population of each state over time. The motion chart should show the job vacancy
rate on the x-axis, the unemployment rate on the y-axis, and the bubble size should depend on
the population. (HINT: A Jupyter notebook containing a tutorial on building motion charts in
Python is available here.)
Answer: The motion chart is in the video below:
2. Run the visualisation from start to finish. (Hint: In Python, to speed up the animation, set timer
bar next to the play/pause button to the minimum value.) And then answer the following
questions:
a) Which state generally has the lowest job vacancy rate?
b) Is the economy generally getting better or worse? I.e. was the Australian economy better in
2006/7 or 2014/5? Explain your answer.
c) Compared to the states, does the Northern Territory generally have higher or lower
unemployment and higher or lower job vacancy rates? What might cause this? Would it
make sense economically to move to NT?
d) According to the graph what happened at the end of 2008 and start of 2009? What might
have caused this?
e) Any other interesting things you notice in the data?
Answer:
a) Tasmania has the lowest job vacancy rate
b) A high unemployment rate does not necessarily mean a bad economy. Similarly, a lower
unemployment rate does not signify a strong economy. Australian economy is a benign
economy rather than a volatile one. If we look through the motion chart data, Australia
began with an average unemployment rate lower than 5% in 2006. However, the average
unemployment rate slipped more downward to around 4% until the end of 2008. Then from
2009 onwards a gradual rise in unemployment rate is observed between 5.5% to 7.0%. This
trend is continued until 2015. As per OECD, the rate of unemployment between 5.5% and
8.3% is good for an economy to thrive and sustain. Hence the data supports that the
Australian economy in 2015 is doing better than earlier and is getting stronger.
Reference: http://www.adamhoward.com.au/blog/2015/3/31/unemployment-when-is-it-
good-and-when-is-it-bad
https://www.focus-economics.com/country-indicator/australia/gdp
c) The Northern Territory have lower unemployment rate and higher vacancy rates than other
states. This might be due to the size of the population. Being one of the smallest state in
terms of population, most of the individuals are employed within the available opportunities
leading to lower unemployment rate. However, the demand for labour may not be
supplemented well by its population, thus creating more vacancies than others.
As we see the population of the state have not increased much, the unemployment rate has
remained more or less the same with reduced vacancies over the time period. This implies,
people from different states have already migrated to the state of Northern Territory, thus
filling up the vacancies. Compared to other states, Northern Territory did not have a higher
unemployment rate along with reduced vacancies. Hence it won’t be very economical to
move to the state.
d) At the end of 2008 and start of 2009 there was a spike in the unemployment rate. This might
be due to the fact that the world economy was hit with a major financial crisis, during this
period. The spike in the unemployment rate and the reduced vacancy rate is indicative of
the period of Great Recession.
e) New South Wales, Victoria and Queensland forms the major part of the Australian Economy.
TaskB:ExploratoryAnalysison BigData
B1. Summarising the Data
Load the InsuranceRates.csv data in Python (or R) and answer the following questions:
1. How many rows and columns are there?
2. How many years does the data cover? (Hint: pandas provide functionality to see 'unique'
values.)
3. What are the possible values for 'Age'?
4. How many states are there?
5. How many insurance providers are there?
6. What are the average, maximum and minimum values for the monthly insurance premium cost
for an individual? Do those values seem reasonable to you?
7. How much more on average do plans for smokers cost?
Answer:
1) There are 12694445 rows and 7 columns
2) The data covers 3 years: 2014, 2015 and 2016
3) The possible values of ages are: '0-20', 'Family Option', '21', '22', '23', '24', '25', '26', '27',
'28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46',
'47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65 and
over'
4) There are 39 states
5) There are 910 insurance providers
6) The aggregate values are:
The Max and Min values are not plausible as the values are too extreme on both ends.
Probably junk records.
7) Plans for smoker costs 88.90566067009055 more on average
Key Insurance Cost
Mean 4098.026458581588
Max 999999
Min 0.0
B2. Investigating Individual Insurance
Costs
Now let's look more in detail at the individual insurance costs.
1. Show the distribution of ‘IndividualRate’ values using a histogram.
a) Does the distribution make sense to? What might be going on?
Answer: The distribution of Individual Rate is shown below using a histogram:
The above histogram doesn’t make much sense due to the fact that the data for the distribution
consists of all the Insurance Rates. The majority of the Insurance rates are paid in the first bar while
a seemingly invisible outlier is observed at the end. The outlier cannot be a plausible value as the
Insurance Rates are too high to be true. To get a proper insight we must delve into the data of the
first bar.
2. Remove rows with insurance premiums of 0 (or less) and over 2000. (Use this data from now
on). Generate a new histogram with a larger number of bins (say 200).
a) Does this data look more sensible?
b) Describe the data. How many groups can you see?
Answer: The distribution of Individual Rate is shown below using a histogram:
Next Page (Contd.)
The histogram data makes more sense now as we can clearly see the distribution of different
Insurance Rates excluding the extreme values.
There are three groups of data in the histogram, which can be categorised into: Low, Medium and
High insurance rates. There are significantly large number of users who are paying a Low insurance
rates but have less options to choose from. For the Medium insurance rates, there is considerable a
widest variety of rates to choose from. There is a small spike in High insurance rates indicating that
there is a very small section of people paying at higher rates.
B3. Variation in Costs across States
How do insurance costs vary across states?
1. Generate a graph containing boxplots summarising the distribution of values for each state.
a) Which state has the lowest median insurance rates and which one has the highest? (Hint:
you may need to rotate the state labels to be able to read the plot.)
b) Is there much variation in costs across states?
Answer: The insurance rates for the various states are shown in the below graph via box plots.
Next Page (Contd.)
The state of ‘MO’ has the least median insurance rates while ‘AK’ has the highest median insurance
rate. There is not much variation in the median insurance rates across each state. Most of the states
have similar median insurance rate, close to between 250 and 350 [approximated]. However, on
inspecting the outliers it can be seen that there is a wide variation in the price of highest insurance
rate across different states. For example, the highest insurance rate in the state of ‘HI’ is around
1000 and that of NC is around 1800.
2. Does the number of insurance issuers vary greatly across states?
a) Create a bar chart of the number of insurance companies in each state to see. (Hint: you will
need to aggregate the data by state to do this.)
Answer: The number of insurance companies are plotted in the graph below:
Next Page (Contd.)
The bar graph clearly shows that the state of ‘TX’ has the highest number of issuers and the state of
‘HI’ has the least number of issuers. The graph depicts that the number of issuers across states in the
descending order does not vary greatly against each other.
3. Could competition explain the difference in insurance premiums across states?
a) Use a scatterplot to plot the number of insurance issuers against the median insurance cost
for each state.
b) Do you observe a relationship?
Answer: The scatter plot is plotted between median insurance rates and issuer count. The relation is
as below:
In every state, there is a strong competition amongst insurance issuers where the insurance rate is
close to between 250 and 350 [approximated]. Most insurance issuers are providing insurances in
the previous mentioned rates with minute differences than that in the other state, attracting various
customers as per their need. Insurance rates above 350 and below 250 holds minimum competition
across insurance issuers across various states.
B4. Variation in Costs over Time and with
Age
Generate boxplots (or other plots) of insurance costs versus year and age to answer
the following questions:
1. Are insurance policies becoming cheaper or more expensive over time?
a) Is the median insurance cost increasing or decreasing?
Answer: The insurance cost is plotted over the year, yielding the below boxed graph:
The box plot shows that the median of the insurance cost is more or less same over the years. Also,
it can be seen that there is a gradual increase in the number of high insurance rate policies over the
years. However, on closer analysis, the median can be found to be gradually increasing as well by a
little margin. The values are as follows:
Year Median
Rate
2014 299.31
2015 307.51
2016 317.37
Hence it can be assumed from the above data that the insurance policies are becoming expensive
over time.
2. How does insurance costs vary with the age of the person being insured? (Hint: filter out the
value 'Family Option' before plotting the data.)
a) Do older people pay more or less for insurance than younger people? How much more/less
do they pay?
Answer: The insurance cost is box plotted against each age and the below graph is obtained:
From the graph, it is clearly evident that the older people pay at a higher insurance rate that the
younger people. The younger people [age: 0-20] pay an average insurance rate of 122.333209 while
the older people [age: 65 and over] pay an average insurance rate of 584.594017. Thus, on an
average the older people pay 462.26 more than the younger people.
TaskC:ExploratoryAnalysison Other Data
Find some publicly available data and repeat some of the analysis performed in Tasks
A and B above. Good sources of data are government websites, such as: data.gov.au,
data.gov, data.gov.in, data.gov.uk, ...
Data source: “All STATS19 data (accident, casualties and vehicle tables) for 2005 to
2014 in England” [Download the data here]
C. Summary and Analysis:
The number of accidents are plotted against each day of the week.
Next Page (Contd.)
It can be seen the more number of accidents are during the start of the weekend i.e. on Friday while
the least number of the accident is on Sunday. This might be due to the fact that a large section of
the crowd prefers to return home after Friday night recreation/party leading to higher number of
accidents. While on Sunday most prefers to stay at home reducing the number of accidents.
The total number of accidents have gradually decreased over the years, however 2014 saw an
increase in the number of accidents.
The number of Fatal injuries have been consistent over the years. However, the count of the least
severe injuries has gradually reduced over the years.
Below graph shows the top 20 UK cities with maximum number of accidents:
Clearly Birmingham, Leeds and Manchester accounts for the most number of accidents in UK and
thus would definitely require a higher number of Police than other districts.
The following visualisation provides the number of accident calls handled by each department of the
police in UK.
The Metropolitan police, West Midlands, Greater Manchester departments of police has served the
top three most numbers of accident cases over the years. The higher number of Metropolitan police
is due to their operations in all the suburbs around London that shares a considerable amount of
accidents every year. However Birmingham may require more police force to address the high
number of accidents (analysed later).
Finding the root cause to the accidents, analysis is done on the Light Conditions for the top 20
accident prone districts.
Accident due to NO LIGHTING:
This box plot clearly shows that there is a high number of accidents in the districts of Doncaster,
Edinburgh, Leeds and Sheffield due to NO LIGHTING. This insight can be used to put more lights
across the streets in those districts to reduce similar accidents.
Accident due to LIGHTS UNLIT:
The above graph shows that the district of Edinburgh, Bristol, Glasgow and Birmingham had more
accidents than others due to unlit lights. The most impacted district is Edinburgh. These 5 districts
require repair in their road lighting service to prevent similar accidents.
In all the city of Edinburgh is most impacted by darkness leading to accidents. The analysis shows
that the city of Edinburgh needs most focus on street lighting than others, by the district
administrators.
The above histogram shows distribution of the age over the number of accidents. The spread depicts
that drivers close to the age of 30 and 47 have most numbers of accidents. Teenagers are the third
most group of drivers in the distribution causing accidents.

More Related Content

What's hot

Assesment in education
Assesment in educationAssesment in education
Assesment in education
annathomas123
 
Data Anayltics: How to predict anything
Data Anayltics: How to predict anythingData Anayltics: How to predict anything
Data Anayltics: How to predict anything
CONTACT Software
 
1.4 revision week session
1.4 revision week session1.4 revision week session
1.4 revision week session
Taka Geo
 
Statistics
StatisticsStatistics
Statistics
diereck
 
Presenting statistics in social media
Presenting statistics in social mediaPresenting statistics in social media
Presenting statistics in social media
University of Pittsburgh
 
AT&T Revenue Regression Forecast
AT&T Revenue Regression ForecastAT&T Revenue Regression Forecast
AT&T Revenue Regression Forecast
Craig Jenkins, MBA
 
Presentation of Data - How to Construct Graphs
Presentation of Data - How to Construct GraphsPresentation of Data - How to Construct Graphs
Presentation of Data - How to Construct Graphs
sheisirenebkm
 
Data Handling
Data Handling Data Handling
Data Handling
75193
 
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
Future Managers
 
From data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and chartsFrom data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and charts
School of Data
 
NCTM 2012 Presentation 1
NCTM 2012 Presentation 1NCTM 2012 Presentation 1
NCTM 2012 Presentation 1
Media4math
 
Graphing The Weather
Graphing The WeatherGraphing The Weather
Graphing The Weather
guest439f251
 
Different Types of Graphs
Different Types of GraphsDifferent Types of Graphs
Different Types of Graphs
RileyAntler
 
Class 3 visual representation of data
Class 3   visual representation of dataClass 3   visual representation of data
Class 3 visual representation of data
UttaraChattopadhyay
 
(8) Lesson 8.1
(8) Lesson 8.1(8) Lesson 8.1
(8) Lesson 8.1
wzuri
 
14 s4 i scatter plot final
14 s4 i scatter plot final14 s4 i scatter plot final
14 s4 i scatter plot final
ABCiABUHB
 
Stem-and-Leaf Plot and Line Plot
Stem-and-Leaf Plot and Line PlotStem-and-Leaf Plot and Line Plot
Stem-and-Leaf Plot and Line Plot
sheisirenebkm
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
Michal Bachman
 
Graphexamples how-to-do
Graphexamples how-to-doGraphexamples how-to-do
Graphexamples how-to-do
Melody Teoxon
 

What's hot (19)

Assesment in education
Assesment in educationAssesment in education
Assesment in education
 
Data Anayltics: How to predict anything
Data Anayltics: How to predict anythingData Anayltics: How to predict anything
Data Anayltics: How to predict anything
 
1.4 revision week session
1.4 revision week session1.4 revision week session
1.4 revision week session
 
Statistics
StatisticsStatistics
Statistics
 
Presenting statistics in social media
Presenting statistics in social mediaPresenting statistics in social media
Presenting statistics in social media
 
AT&T Revenue Regression Forecast
AT&T Revenue Regression ForecastAT&T Revenue Regression Forecast
AT&T Revenue Regression Forecast
 
Presentation of Data - How to Construct Graphs
Presentation of Data - How to Construct GraphsPresentation of Data - How to Construct Graphs
Presentation of Data - How to Construct Graphs
 
Data Handling
Data Handling Data Handling
Data Handling
 
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
 
From data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and chartsFrom data to diagrams: an introduction to basic graphs and charts
From data to diagrams: an introduction to basic graphs and charts
 
NCTM 2012 Presentation 1
NCTM 2012 Presentation 1NCTM 2012 Presentation 1
NCTM 2012 Presentation 1
 
Graphing The Weather
Graphing The WeatherGraphing The Weather
Graphing The Weather
 
Different Types of Graphs
Different Types of GraphsDifferent Types of Graphs
Different Types of Graphs
 
Class 3 visual representation of data
Class 3   visual representation of dataClass 3   visual representation of data
Class 3 visual representation of data
 
(8) Lesson 8.1
(8) Lesson 8.1(8) Lesson 8.1
(8) Lesson 8.1
 
14 s4 i scatter plot final
14 s4 i scatter plot final14 s4 i scatter plot final
14 s4 i scatter plot final
 
Stem-and-Leaf Plot and Line Plot
Stem-and-Leaf Plot and Line PlotStem-and-Leaf Plot and Line Plot
Stem-and-Leaf Plot and Line Plot
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
 
Graphexamples how-to-do
Graphexamples how-to-doGraphexamples how-to-do
Graphexamples how-to-do
 

Similar to Exploring australian economy and diversity

The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
oreo10
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
mscartersmaths
 
Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11
westy67968
 
Project #4 Urban Population Dynamics This project will acquaint y.pdf
  Project #4 Urban Population Dynamics   This project will acquaint y.pdf  Project #4 Urban Population Dynamics   This project will acquaint y.pdf
Project #4 Urban Population Dynamics This project will acquaint y.pdf
anandinternational01
 
Linear Algebra Project Urban Population Dynamics This project is.pdf
Linear Algebra Project  Urban Population Dynamics This project is.pdfLinear Algebra Project  Urban Population Dynamics This project is.pdf
Linear Algebra Project Urban Population Dynamics This project is.pdf
airflyluggage
 
Data mining
Data miningData mining
Data mining
Scilab
 
MLR Project (Onion)
MLR Project (Onion)MLR Project (Onion)
MLR Project (Onion)
Chawal Ukesh
 
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
AayzazAhmad
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Himakshi7
 
Presentation
PresentationPresentation
R Visualization Assignment
R Visualization AssignmentR Visualization Assignment
R Visualization Assignment
Vassilis Kapatsoulias
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer Applications
DrMateoMacalaguingJr
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Dhwani Shah
 
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxGraphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
RanggaMasyhuriNuur
 
Data presentation2019.pptx
Data presentation2019.pptxData presentation2019.pptx
Data presentation2019.pptx
RaadAzeez1
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
Stephen Tracy
 
Data management Final Project
Data management Final ProjectData management Final Project
Data management Final Project
Rutuja Gangane
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2
wzuri
 
SOCI 234 Population and Society. Migration HomeworkMust be sub.docx
SOCI 234 Population and Society. Migration HomeworkMust be sub.docxSOCI 234 Population and Society. Migration HomeworkMust be sub.docx
SOCI 234 Population and Society. Migration HomeworkMust be sub.docx
pbilly1
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
Ashwini Mathur
 

Similar to Exploring australian economy and diversity (20)

The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
 
Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11
 
Project #4 Urban Population Dynamics This project will acquaint y.pdf
  Project #4 Urban Population Dynamics   This project will acquaint y.pdf  Project #4 Urban Population Dynamics   This project will acquaint y.pdf
Project #4 Urban Population Dynamics This project will acquaint y.pdf
 
Linear Algebra Project Urban Population Dynamics This project is.pdf
Linear Algebra Project  Urban Population Dynamics This project is.pdfLinear Algebra Project  Urban Population Dynamics This project is.pdf
Linear Algebra Project Urban Population Dynamics This project is.pdf
 
Data mining
Data miningData mining
Data mining
 
MLR Project (Onion)
MLR Project (Onion)MLR Project (Onion)
MLR Project (Onion)
 
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
1.AdvantagesandDisadvantagesofDotPlotsHistogramsandBoxPlotsLesson.pptx
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
 
Presentation
PresentationPresentation
Presentation
 
R Visualization Assignment
R Visualization AssignmentR Visualization Assignment
R Visualization Assignment
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer Applications
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxGraphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
 
Data presentation2019.pptx
Data presentation2019.pptxData presentation2019.pptx
Data presentation2019.pptx
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Data management Final Project
Data management Final ProjectData management Final Project
Data management Final Project
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2
 
SOCI 234 Population and Society. Migration HomeworkMust be sub.docx
SOCI 234 Population and Society. Migration HomeworkMust be sub.docxSOCI 234 Population and Society. Migration HomeworkMust be sub.docx
SOCI 234 Population and Society. Migration HomeworkMust be sub.docx
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 

Exploring australian economy and diversity

  • 1. TaskA:InvestigatingJob Vacancyand UnemploymentRateData A1. Investigating the Population Data Have a look at the resident population data. You will see many columns. We are interested only in the total values for each state (marked "Persons"), so you can drop the other columns and rename the columns for each state if you wish. (HINT: The file isn't very big so you can make the changes in Excel if you want.) 1. In Python (or R) plot the population of Victoria, New South Wales and Queensland over time. (HINT: You don't need to put the dates on the x-axis, just showing the index of each quarter is fine) a) Are the population values increasing or decreasing over time? b) Does the population data exhibit a trend and if so, what type? Answer: The below relation is obtained while tracing the count of the population for the three states viz Victoria, New South Wales and Queensland over the time. As the graphs are plotted it is evident that the count of the population is gradually increasing for the three states over the time. Queensland has the least population among the three states while New South Wales has the maximum population. The trend is linearly increasing one with a positive slope over the time.
  • 2. 2. Fit a linear regression using Python (or R) to the Victorian population data and plot the linear fit. (HINT: In Python, you can use the "range (1, n)" function to generate a sequence of integer values: 1, 2..., n) a) Does the linear fit look good? b) Use the linear fit to predict the resident population in Victoria for the dates: 1/9/15, 1/12/15, 1/12/16, and 1/12/17. Answer: The values of the Victorian population is first scattered plotted and then linear regression is applied on the data for best fit line. The linear fit looks definitely good. The graph is as follows: The predicted population for the given dates are as below: A2. Investigating the Job Vacancies Data Now have a look at the job vacancies data. 1. Use Python (or R) to plot the job vacancy counts for Victoria over time. (HINT: Pandas contains a "transpose ()" method and Excel can also be used to transpose data.) a) What are maximum and minimum values for job vacancies in Victoria over time period? Date Population 1/9/15 5739516.54838 1/12/15 5979953.5504 1/12/16 6076128.35121 1/12/17 6172303.15202
  • 3. Answer: The vacancy count of Victoria is plotted over time. The graph is as follows: The maximum and the minimum values of the population are 71971 and 32322 respectively. 2. Fit a linear regression to the data and plot it. a) Does it look like a good fit to you? Would you believe the predictions of the linear model going forward? b) Instead of fitting the linear regression to all of the data, try fitting it to just the most recent data points (say from the 85th data point onwards). How is the fit? Which model would give better predictions of future vacancies do you think? Answer: Firstly, the linear regression is implemented on the total Victorian population data. Then the linear regression is implemented on the 85th data onwards. The below graphs are obtained.
  • 4. The line is definitely not a good fit. The data is arranged as a function of polynomial equation rather than a linear one. In this case a linear fit line will not be able to provide correct estimations of the data. Hence, the linear model based on all the data is not plausible for any prediction. Choosing the data from the 85th row onwards provides a linear arrangement of data. In this scenario, a linear fit line is desirable. As per the plotted graph, it can be seen that the line fits very close to all the data linearly. Hence to predict a data WITHIN the time interval [85th Row] to [130th Row], the second model suits the best. However, to predict the FUTURE data, none of the above models fits best as it is evident from the history value, the data shows linear trend (both positive and negative slopes) at certain intervals only. It might be the case that the interval from the 131th row onwards shows a linear trend with a downward slope. In this case, the second model fails as well, to predict the data correctly. Here, regression using a polynomial model definitely holds an upper hand than the linear model. A3. Investigating the Unemployment Data Now have a look at the unemployment data. 1. Use Python (or R) to plot the Unemployment Rate for Victoria over time. a) It looks like the rate has been very high at times in the past. What was the maximum unemployment rate in Victoria recorded in the dataset and when did that occur? Answer: Next Page (Contd.)
  • 5. The maximum unemployment rate was: 12.5533377 during the year 1993 in the month of August. A4. Visualising the Relationship between Unemployment and Job Vacancies Now let's look at the relationship between unemployment levels and job vacancies. 1. Python (or R) to combine the data from the different files into a single table. The table should contain population values, job vacancy counts and unemployment rates for the different dates and different States/Territories. a) What is the first date and last date for the combined data?’ Answer: The first date and the last date for the combined data are as below: 2. Now that you have the data aggregated, we can see whether there is a relationship between unemployment and the number of job vacancies. Plot the values against each other. a) Can you see a relationship there? Answer: The merged data is now used to plot the unemployment and the vacancy of all the states. A scatter plot has been used instead of a line plot as the graph generated from the scatter plot is more legible in this case. The graph is as below: Argument Value Min Date 2015/03/01 Max Date 2015/06/01
  • 6. The above picture shows that the vacancies are quite high when the unemployment rate is between 4 and 6. However the graph fails to produce any meaningful insight. This can be due to the fact that the plotted data contains vacancy rate and unemployment rate of all the States for all quarters in an unstructured way, without any correlation among them. An approach to deduce a more meaningful relation between unemployment rate and vacancies wou ld be to group the cumulative values (for all states) based on each quarter. On plotting the data, it pr oduces the following graph: This graph clearly shows that the Vacancy and Unemployment has an inverse relation. As the Vacanc y increases gradually the unemployment decreases. This is in accordance to the real-life scenario.
  • 7. 3. Try selecting and plotting only the data from Victoria. a) Can you see a relationship now? If so, what relationship is there? Answer: Unlike the previous graph to establish relationship for all states, in this case, the unemployment and the vacancy data is plotted against the state of Victoria only. The below graph is obtained. The graph correlates to the previous finding of grouped data. Here the Vacancy for the state of Victoria is gradually decreasing as the Unemployment Rate grows. Noteworthy, the vacancies for the state of Victoria are quite high and seemingly unaffected until the unemployment rate reaches the value of 5. 4. The different populations across the states will influence the number of job vacancies in each. Remove this effect by introducing a new column called 'Vacancy Rate' which contains the vacancy count divided by the population size, multiplied by 100. a) Is there a relationship between the unemployment rate and the job vacancy rate across all the data? Answer: The column is added to the source data. Now, the vacancy rate and the unemployment rate are plotted for both type of data (Grouped and Ungrouped). Next Page (Contd.)
  • 8. Both the above methodology suggests that the Vacancy rate is inversely related to the Unemployment Rate. The Vacancy Rate has clearly shaped the trend in to a more linearly degrading form by omitting the effect of population count. Mention worthy, in all the above cases the vacancies are not impacted by the unemployment rate until it reaches a certain threshold unemployment rate of around 4.5
  • 9. A5. Visualising the Relationship over Time Now let's look at the relationship between unemployment levels and job vacancies over time. 1. Use Python (or R) to build a Motion Chart comparing the job vacancy rate, the unemployment rate, and the population of each state over time. The motion chart should show the job vacancy rate on the x-axis, the unemployment rate on the y-axis, and the bubble size should depend on the population. (HINT: A Jupyter notebook containing a tutorial on building motion charts in Python is available here.) Answer: The motion chart is in the video below: 2. Run the visualisation from start to finish. (Hint: In Python, to speed up the animation, set timer bar next to the play/pause button to the minimum value.) And then answer the following questions: a) Which state generally has the lowest job vacancy rate? b) Is the economy generally getting better or worse? I.e. was the Australian economy better in 2006/7 or 2014/5? Explain your answer. c) Compared to the states, does the Northern Territory generally have higher or lower unemployment and higher or lower job vacancy rates? What might cause this? Would it make sense economically to move to NT? d) According to the graph what happened at the end of 2008 and start of 2009? What might have caused this? e) Any other interesting things you notice in the data?
  • 10. Answer: a) Tasmania has the lowest job vacancy rate b) A high unemployment rate does not necessarily mean a bad economy. Similarly, a lower unemployment rate does not signify a strong economy. Australian economy is a benign economy rather than a volatile one. If we look through the motion chart data, Australia began with an average unemployment rate lower than 5% in 2006. However, the average unemployment rate slipped more downward to around 4% until the end of 2008. Then from 2009 onwards a gradual rise in unemployment rate is observed between 5.5% to 7.0%. This trend is continued until 2015. As per OECD, the rate of unemployment between 5.5% and 8.3% is good for an economy to thrive and sustain. Hence the data supports that the Australian economy in 2015 is doing better than earlier and is getting stronger. Reference: http://www.adamhoward.com.au/blog/2015/3/31/unemployment-when-is-it- good-and-when-is-it-bad https://www.focus-economics.com/country-indicator/australia/gdp c) The Northern Territory have lower unemployment rate and higher vacancy rates than other states. This might be due to the size of the population. Being one of the smallest state in terms of population, most of the individuals are employed within the available opportunities leading to lower unemployment rate. However, the demand for labour may not be supplemented well by its population, thus creating more vacancies than others. As we see the population of the state have not increased much, the unemployment rate has remained more or less the same with reduced vacancies over the time period. This implies, people from different states have already migrated to the state of Northern Territory, thus filling up the vacancies. Compared to other states, Northern Territory did not have a higher unemployment rate along with reduced vacancies. Hence it won’t be very economical to move to the state. d) At the end of 2008 and start of 2009 there was a spike in the unemployment rate. This might be due to the fact that the world economy was hit with a major financial crisis, during this period. The spike in the unemployment rate and the reduced vacancy rate is indicative of the period of Great Recession. e) New South Wales, Victoria and Queensland forms the major part of the Australian Economy.
  • 11. TaskB:ExploratoryAnalysison BigData B1. Summarising the Data Load the InsuranceRates.csv data in Python (or R) and answer the following questions: 1. How many rows and columns are there? 2. How many years does the data cover? (Hint: pandas provide functionality to see 'unique' values.) 3. What are the possible values for 'Age'? 4. How many states are there? 5. How many insurance providers are there? 6. What are the average, maximum and minimum values for the monthly insurance premium cost for an individual? Do those values seem reasonable to you? 7. How much more on average do plans for smokers cost? Answer: 1) There are 12694445 rows and 7 columns 2) The data covers 3 years: 2014, 2015 and 2016 3) The possible values of ages are: '0-20', 'Family Option', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65 and over' 4) There are 39 states 5) There are 910 insurance providers 6) The aggregate values are: The Max and Min values are not plausible as the values are too extreme on both ends. Probably junk records. 7) Plans for smoker costs 88.90566067009055 more on average Key Insurance Cost Mean 4098.026458581588 Max 999999 Min 0.0
  • 12. B2. Investigating Individual Insurance Costs Now let's look more in detail at the individual insurance costs. 1. Show the distribution of ‘IndividualRate’ values using a histogram. a) Does the distribution make sense to? What might be going on? Answer: The distribution of Individual Rate is shown below using a histogram: The above histogram doesn’t make much sense due to the fact that the data for the distribution consists of all the Insurance Rates. The majority of the Insurance rates are paid in the first bar while a seemingly invisible outlier is observed at the end. The outlier cannot be a plausible value as the Insurance Rates are too high to be true. To get a proper insight we must delve into the data of the first bar. 2. Remove rows with insurance premiums of 0 (or less) and over 2000. (Use this data from now on). Generate a new histogram with a larger number of bins (say 200). a) Does this data look more sensible? b) Describe the data. How many groups can you see? Answer: The distribution of Individual Rate is shown below using a histogram: Next Page (Contd.)
  • 13. The histogram data makes more sense now as we can clearly see the distribution of different Insurance Rates excluding the extreme values. There are three groups of data in the histogram, which can be categorised into: Low, Medium and High insurance rates. There are significantly large number of users who are paying a Low insurance rates but have less options to choose from. For the Medium insurance rates, there is considerable a widest variety of rates to choose from. There is a small spike in High insurance rates indicating that there is a very small section of people paying at higher rates. B3. Variation in Costs across States How do insurance costs vary across states? 1. Generate a graph containing boxplots summarising the distribution of values for each state. a) Which state has the lowest median insurance rates and which one has the highest? (Hint: you may need to rotate the state labels to be able to read the plot.) b) Is there much variation in costs across states? Answer: The insurance rates for the various states are shown in the below graph via box plots. Next Page (Contd.)
  • 14. The state of ‘MO’ has the least median insurance rates while ‘AK’ has the highest median insurance rate. There is not much variation in the median insurance rates across each state. Most of the states have similar median insurance rate, close to between 250 and 350 [approximated]. However, on inspecting the outliers it can be seen that there is a wide variation in the price of highest insurance rate across different states. For example, the highest insurance rate in the state of ‘HI’ is around 1000 and that of NC is around 1800. 2. Does the number of insurance issuers vary greatly across states? a) Create a bar chart of the number of insurance companies in each state to see. (Hint: you will need to aggregate the data by state to do this.) Answer: The number of insurance companies are plotted in the graph below: Next Page (Contd.)
  • 15. The bar graph clearly shows that the state of ‘TX’ has the highest number of issuers and the state of ‘HI’ has the least number of issuers. The graph depicts that the number of issuers across states in the descending order does not vary greatly against each other. 3. Could competition explain the difference in insurance premiums across states? a) Use a scatterplot to plot the number of insurance issuers against the median insurance cost for each state. b) Do you observe a relationship? Answer: The scatter plot is plotted between median insurance rates and issuer count. The relation is as below:
  • 16. In every state, there is a strong competition amongst insurance issuers where the insurance rate is close to between 250 and 350 [approximated]. Most insurance issuers are providing insurances in the previous mentioned rates with minute differences than that in the other state, attracting various customers as per their need. Insurance rates above 350 and below 250 holds minimum competition across insurance issuers across various states. B4. Variation in Costs over Time and with Age Generate boxplots (or other plots) of insurance costs versus year and age to answer the following questions: 1. Are insurance policies becoming cheaper or more expensive over time? a) Is the median insurance cost increasing or decreasing? Answer: The insurance cost is plotted over the year, yielding the below boxed graph: The box plot shows that the median of the insurance cost is more or less same over the years. Also, it can be seen that there is a gradual increase in the number of high insurance rate policies over the years. However, on closer analysis, the median can be found to be gradually increasing as well by a little margin. The values are as follows: Year Median Rate 2014 299.31 2015 307.51 2016 317.37
  • 17. Hence it can be assumed from the above data that the insurance policies are becoming expensive over time. 2. How does insurance costs vary with the age of the person being insured? (Hint: filter out the value 'Family Option' before plotting the data.) a) Do older people pay more or less for insurance than younger people? How much more/less do they pay? Answer: The insurance cost is box plotted against each age and the below graph is obtained: From the graph, it is clearly evident that the older people pay at a higher insurance rate that the younger people. The younger people [age: 0-20] pay an average insurance rate of 122.333209 while the older people [age: 65 and over] pay an average insurance rate of 584.594017. Thus, on an average the older people pay 462.26 more than the younger people. TaskC:ExploratoryAnalysison Other Data Find some publicly available data and repeat some of the analysis performed in Tasks A and B above. Good sources of data are government websites, such as: data.gov.au, data.gov, data.gov.in, data.gov.uk, ... Data source: “All STATS19 data (accident, casualties and vehicle tables) for 2005 to 2014 in England” [Download the data here] C. Summary and Analysis: The number of accidents are plotted against each day of the week. Next Page (Contd.)
  • 18. It can be seen the more number of accidents are during the start of the weekend i.e. on Friday while the least number of the accident is on Sunday. This might be due to the fact that a large section of the crowd prefers to return home after Friday night recreation/party leading to higher number of accidents. While on Sunday most prefers to stay at home reducing the number of accidents. The total number of accidents have gradually decreased over the years, however 2014 saw an increase in the number of accidents.
  • 19. The number of Fatal injuries have been consistent over the years. However, the count of the least severe injuries has gradually reduced over the years. Below graph shows the top 20 UK cities with maximum number of accidents: Clearly Birmingham, Leeds and Manchester accounts for the most number of accidents in UK and thus would definitely require a higher number of Police than other districts.
  • 20. The following visualisation provides the number of accident calls handled by each department of the police in UK. The Metropolitan police, West Midlands, Greater Manchester departments of police has served the top three most numbers of accident cases over the years. The higher number of Metropolitan police
  • 21. is due to their operations in all the suburbs around London that shares a considerable amount of accidents every year. However Birmingham may require more police force to address the high number of accidents (analysed later). Finding the root cause to the accidents, analysis is done on the Light Conditions for the top 20 accident prone districts. Accident due to NO LIGHTING:
  • 22. This box plot clearly shows that there is a high number of accidents in the districts of Doncaster, Edinburgh, Leeds and Sheffield due to NO LIGHTING. This insight can be used to put more lights across the streets in those districts to reduce similar accidents. Accident due to LIGHTS UNLIT: The above graph shows that the district of Edinburgh, Bristol, Glasgow and Birmingham had more accidents than others due to unlit lights. The most impacted district is Edinburgh. These 5 districts require repair in their road lighting service to prevent similar accidents. In all the city of Edinburgh is most impacted by darkness leading to accidents. The analysis shows that the city of Edinburgh needs most focus on street lighting than others, by the district administrators.
  • 23. The above histogram shows distribution of the age over the number of accidents. The spread depicts that drivers close to the age of 30 and 47 have most numbers of accidents. Teenagers are the third most group of drivers in the distribution causing accidents.