Insurers' journeys to build a mastery in the IoT usage
Ā
Pollution in Delhi - Correlation Analysis
1. Project on
Correlation and Regression Analysis
On
Number of Vehicles and Pollution
Levels
In
New Delhi
Statistics for Business Decisions
2. Certificate of Originality
This is to certify that this project has been made by Mr.
Alekh Kushwaha, Mr. Ashish Sharma, Mr. Harsh Lal, Mr.
Kartik Jain, First year students of Bachelor of
Management Studies, Ramanujan College, University of
Delhi. This project has been made under the guidance of
our honorable statistics professor Dr. K. Latha. The
project submitted is our original work and we are
responsible for the content, except as specified in
acknowledgements or references.
3. Acknowledgement
We would like to express our special thanks of gratitude
to our teacher Dr. K. Latha as well as our principal Dr.
S.P. Aggarwal who gave us the golden opportunity to do
this wonderful project on the topic correlation and
regression analysis on number of vehicles and pollution
levels in New Delhi which also helped us in doing a lot
of research and we came to know about so many new
things we are really thankful to them.
Secondly we would also like to thank our parents and
friends who helped us a lot in finalizing this project.
4. Introduction
ļ¶ The air quality in Delhi, the capital of
India, according to a WHO survey of 1600
world cities, is the worst of any major city in
the world.
ļ¶ Air pollution in India is estimated to kill
1.5 million people every year; it is the fifth
largest killer in India. India has the world's
highest death rate from chronic respiratory
diseases and asthma, according to the
WHO.
ļ¶ In Delhi, poor quality air damages
irreversibly the lungs of 2.2 million or 50
5. ļ¶Air quality or ambient (outdoor) air pollution is
represented by the annual mean concentration
of particulate matter PM10 which are particles
smaller than 10 microns.
ļ¶Safe levels for PM according to the WHO's air
quality guidelines are 20 Ī¼g/m3 (annual mean) for
PM10
ļ¶2.2 million children in Delhi have irreversible lung
damage due to the poor quality of the air. In
addition, research shows that pollution can lower
childrenās intelligence quotient and increase the
risk of autism, diabetes and even adult-onset
diseases like multiple sclerosis.
ļ¶Poor air quality is also a cause of reduced lung
6. Causes of Air Pollution
in New Delhi
ļ¶Motor vehicle emissions are one of the causes
of poor air quality. According to some reports, 80
per cent of PM10 air pollution is caused by
vehicular traffic.
ļ¶Other causes include wood-burning fires, fires
on agricultural land, exhaust from diesel
generators, dust from construction sites, and
burning garbage.
ļ¶The vehicular population in the national capital
7. The rising number of motor vehicles has
been a primary cause for causing Delhiās air
pollution
8. Objective of the
project
ļ¶The objective of this project is to analyze the
relationship between the number of motor vehicles
and air pollution in New Delhi through statistical
measures like correlation and regression.
ļ¶ The goal of the project is to analyze if the rising
vehicular population has been a cause of Delhiās air
pollution
ļ¶We have taken data for ten years between 2001
and 2011 to study the correlation between the
number of vehicles and worsening air quality of
9. CORRELATION
ļ¶Correlation is a statistical technique that can show
whether and how strongly pairs of variables are related.
The main result of a correlation is called the correlation
coefficient (or "r"). It ranges from -1.0 to +1.0. The
closer r is to +1 or -1, the more closely the two variables
are related.
ļ¶If r is close to 0, it means there is no relationship
between the variables. If r is positive, it means that as
one variable gets larger the other gets larger. If r is
negative it means that as one gets larger, the other gets
10. ļ¶ Correlations are useful because they can
indicate a predictive relationship that can be
exploited in practice.
ļ¶The most familiar measure of dependence
between two quantities is the "Pearson's
correlation coefficientā. It is obtained by dividing
the covariance of the two variables by the product
of their standard deviations.
ļ¶Karl Pearson developed the coefficient from a
similar but slightly different idea by Francis
Galton.
ļ¶Pearson's correlation coefficient when applied to
a sample is commonly represented by the
letter r and may be referred to as the sample
correlation coefficient or the sample Pearson
correlation coefficient. We can obtain a formula
11. DATA FOR ANALYSIS
No. of Registered Motor Vehicles in Delhi
YEAR Number of motor vehicles
2001 3635000
2002 3699000
2003 3971000
2004 4236000
2005 4186000
2006 4487000
2007 5492000
2008 5899000
2009 6302000
2010 6746000
2011 7228000
12. source: data.gov.in and CPCB report
Annual Average Ambient PM10 concentration in Delhi
YEAR Annual Average of PM10 concentration
2001 120
2002 140
2003 130
2004 135
2005 120
2006 135
2007 160
2008 220
2009 250
2010 260
2011 270
15. ļ¶The coefficient of correlation between the two
variables number of vehicles (X) and Average
Annual PM10 levels comes out to be 0.959077.
ļ¶The coefficient of correlation is very high which
signifies there is a relationship between the two
variables.
ļ¶This signifies that due to a positive increase in the
number of vehicles on the roads of Delhi the
average consideration pollution causing PM10 has
also shown a positive increase.
ļ¶In this case the coefficient of correlation is very
close to perfect correaltion.
16. Graph showing correlation between number of
vehicles and annual average PM10 levels in
New Delhi
0
50
100
150
200
250
300
0 1000 2000 3000 4000 5000 6000 7000 8000
AnnualAveragePM10
levels
Number of vehicles (in thousands)
Correlation scatter plot
17. Regression
ļ¶ Regression is a statistical measure used in finance,
investing and other disciplines that attempts to determine
the strength of the relationship between one dependent
variable (usually denoted by Y) and a series of other
changing variables (known as independent variables).
ļ¶Regression analysis is widely used
for prediction and forecasting, where its use has
substantial overlap with the field of machine learning.
Regression analysis is also used to understand which
among the independent variables are related to the
dependent variable, and to explore the forms of these
18. ļ¶Regression Formula:
Regression Equation
y = a + bx
Slope (b) = (NĪ£XY - (Ī£X)(Ī£Y)) / (NĪ£X2 - (Ī£X)2)
Intercept (a) = (Ī£Y - b(Ī£X)) / N
ļ¶The next slide shows that in this case the
equation of the regression line thus formed is y
= 0.44x-50.98. Here 0.44 is the slope of the
regression line. R square is a large value which
shows high degree of correlation between the
two values.
19. Regression analysis
y = 0.044x - 50.98
RĀ² = 0.919
0
50
100
150
200
250
300
3635 4135 4635 5135 5635 6135 6635 7135 7635
AnnualAveragePM10
levels
Number of vehicles (in thousands)
20. CONCLUSION
From the above analysis, we obtain the correlation
0.95907747 which indicates the high degree of relation
between the increasing number of vehicles on the roads of
Delhi to the pollution that persists in the atmosphere.
High correlation shows that the level of pollution hikes up
as the number of registered motor vehicles increases.
Since the city's pollution is rising as a whole, the pollution
from the vehicles occupies a major part of it. less number
of cars shall contribute towards lesser pollution and hence
a safer environment. Carpooling, Ecofriendly Gases, Less
usage of vehicles can also contribute towards the
betterment of the people of Delhi.