Generated a Statistical Report on air quality of Ireland (correlation and regression) using SPSS and religious belief of different age group people in their respective religion(Two way ANOVA) using R.
Atmospheric Pollutant Concentration Prediction Based on KPCA BPijtsrd
PM2.5 prediction research has important significance for improving human health and atmospheric environmental quality, etc. This paper uses a model combining nuclear principal component analysis method and neural network to study the prediction problem of meteorological pollutant concentration, and compares the experimental results with the prediction results of the original neural network and the principal component analysis neural network. Based on the O3, CO, PM10, SO2, NO2 concentrations and parallel meteorological conditions data of Beijing from 2016 to 2020, the PM2.5 concentration was predicted. First, reduce the latitude of the data, and then use the KPCA BP neural network algorithm for training. The results show that the average absolute error, root mean square error and expected variance score of the combined model are relatively good, the generalization ability is strong, and the extreme value prediction is the best, which is better than that of the single model. Xin Lin | Bo Wang | Wenjing Ai "Atmospheric Pollutant Concentration Prediction Based on KPCA-BP" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-5 , August 2022, URL: https://www.ijtsrd.com/papers/ijtsrd51746.pdf Paper URL: https://www.ijtsrd.com/engineering/environment-engineering/51746/atmospheric-pollutant-concentration-prediction-based-on-kpcabp/xin-lin
Influence of Ca[Mg 3 SiN 4 ]Ce 3+ phosphor’s concentration on optical proper...IJECEIAES
In this paper, we propose Ca[Mg 3 SiN 4 ]Ce 3+ Phosphor as a new material solution for improving the optical properties in terms of CRI, CQS. D-CCT, LO of the 5600K remote-packaging white LEDs (RP-WLEDs). In the first stage, we built and investigated the 5600K RP-WLEDs by adding the red phosphor to the phosphor layer. Then, the scattering processes inside the phosphor layer are investigated by Mat Lab software. From the research results, we discovered that the concentration of the adding phosphor significantly improved the optical properties of the 5600K RP-WLEDs. All the results are convinced by Light Tools and Mat Lab software.
Shortcut Design Method for Multistage Binary Distillation via MS-ExceIJERA Editor
Multistage distillation is most widely used industrial method for separating chemical mixtures with high energy consumptions especially when relative volatility of key components is lower than 1.5. The McCabe Thiele is considered to be the simplest and perhaps most instructive method for the conceptual design of binary distillation column which is still widely used, mainly for quick preliminary calculations. In this present work, we provide a numerical solution to a McCabe-Thiele method to find out theoretical number of stages for ideal and non-ideal binary system, reflux ratio, condenser duty, reboiler duty, each plate composition inside the column. Each and every point related to McCabe-Thiele in MS-Excel to give quick column dimensions are discussed in details
NANO281 is the University of California San Diego NanoEngineering Department's first course on the application of data science in materials science. It is taught by Professor Shyue Ping Ong of the Materials Virtual Lab (http://www.materialsvirtuallab.org).
Atmospheric Pollutant Concentration Prediction Based on KPCA BPijtsrd
PM2.5 prediction research has important significance for improving human health and atmospheric environmental quality, etc. This paper uses a model combining nuclear principal component analysis method and neural network to study the prediction problem of meteorological pollutant concentration, and compares the experimental results with the prediction results of the original neural network and the principal component analysis neural network. Based on the O3, CO, PM10, SO2, NO2 concentrations and parallel meteorological conditions data of Beijing from 2016 to 2020, the PM2.5 concentration was predicted. First, reduce the latitude of the data, and then use the KPCA BP neural network algorithm for training. The results show that the average absolute error, root mean square error and expected variance score of the combined model are relatively good, the generalization ability is strong, and the extreme value prediction is the best, which is better than that of the single model. Xin Lin | Bo Wang | Wenjing Ai "Atmospheric Pollutant Concentration Prediction Based on KPCA-BP" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-5 , August 2022, URL: https://www.ijtsrd.com/papers/ijtsrd51746.pdf Paper URL: https://www.ijtsrd.com/engineering/environment-engineering/51746/atmospheric-pollutant-concentration-prediction-based-on-kpcabp/xin-lin
Influence of Ca[Mg 3 SiN 4 ]Ce 3+ phosphor’s concentration on optical proper...IJECEIAES
In this paper, we propose Ca[Mg 3 SiN 4 ]Ce 3+ Phosphor as a new material solution for improving the optical properties in terms of CRI, CQS. D-CCT, LO of the 5600K remote-packaging white LEDs (RP-WLEDs). In the first stage, we built and investigated the 5600K RP-WLEDs by adding the red phosphor to the phosphor layer. Then, the scattering processes inside the phosphor layer are investigated by Mat Lab software. From the research results, we discovered that the concentration of the adding phosphor significantly improved the optical properties of the 5600K RP-WLEDs. All the results are convinced by Light Tools and Mat Lab software.
Shortcut Design Method for Multistage Binary Distillation via MS-ExceIJERA Editor
Multistage distillation is most widely used industrial method for separating chemical mixtures with high energy consumptions especially when relative volatility of key components is lower than 1.5. The McCabe Thiele is considered to be the simplest and perhaps most instructive method for the conceptual design of binary distillation column which is still widely used, mainly for quick preliminary calculations. In this present work, we provide a numerical solution to a McCabe-Thiele method to find out theoretical number of stages for ideal and non-ideal binary system, reflux ratio, condenser duty, reboiler duty, each plate composition inside the column. Each and every point related to McCabe-Thiele in MS-Excel to give quick column dimensions are discussed in details
NANO281 is the University of California San Diego NanoEngineering Department's first course on the application of data science in materials science. It is taught by Professor Shyue Ping Ong of the Materials Virtual Lab (http://www.materialsvirtuallab.org).
Sr[Mg3SiN4]Eu2+ phosphor: solution for enhancing the optical properties of th...TELKOMNIKA JOURNAL
In the last decade, light-emitting diodes (LEDs), which based on spontaneous light emission in semiconductors can be considered as the main light sources for civil and industrial purposes. In this paper, we presented and investigated the effect of the Sr[Mg3SiN4]Eu2+ concentration on the optical properties of the 5600K remote-packaging WLEDs (RP-WLEDs). We use the Mat Lab and the LightTool software to investigate the effect of the Sr[Mg3SiN4]Eu2+ concentration on the CRI, CQS, D-CCT and LO of the 5600K RP-WLEDs. From the result, we can state that the concentration of the Sr[Mg3SiN4]Eu2+ influenced on the CRI, CQS, D-CCT and LO of the RP-WLEDs. The red Sr[Mg3SiN4]Eu2+ phosphor can be considered as the novel recommendation for LEDs industry.
Air Quality Prediction using Seaborn and TensorFlowijtsrd
Air quality is considered as a vital issue in the current world and is the underlying driver of sicknesses identified with respiratory organ, skin malignant growth, corrosive downpour and a worldwide temperature alteration. Anticipating air quality has been the consistent test with the developing industrialization, vehicles out and about, deforestation and different variables. Air contamination has been the issue of the entire world. In this paper, we propose to foresee the air nature of a specific spot, with the information gathered in past and take preventive measure to stop the disaster. We will utilize Spearmans Correlation as information used to foresee air quality is non straight and monotonic. Spearmans Correlation coefficient rs can invigorate us of the connection between highlights of information. Rahul Kumar Sharma | Kuldeep Baban Vayadande | Rahul Ranjan "Air Quality Prediction using Seaborn and TensorFlow" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd37975.pdf Paper URL : https://www.ijtsrd.com/computer-science/other/37975/air-quality-prediction-using-seaborn-and-tensorflow/rahul-kumar-sharma
Comparative analysis of multiple classification models to improve PM10 predic...IJECEIAES
With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.
In this research, we propose a novel recommendation for improving CCT-D and lumen output (LO) of the 6600K in-cup packaging white LEDs (ICP-WLEDs) by varying its particle concentration. By using Light Tools and Mat lab software based on the Mie Theory, we derive the influence of the red phosphor particle’s concentration on the D-CCT and LO. The results show that the CCT-D are significantly affected when the concentration of the red phosphor varying from 0% to 1.8%. The CCT-D decreases from 4000K to 2200K and LO increases from 800 lm to 1300 lm.
Quantitative Evaluation of Dissociation Mechanisms in Methylorange and MethylredAI Publications
Several computational chemistry programs were evaluated as aids to teaching a part of qualitative analytical chemistry. Computational chemical calculations can predict absorption spectra, thus enabling the modeling of indicator dissociation mechanisms using a personal computer. An updated MNDO program among 51 programs was previously found to be the best predictor to explain the dissociation mechanisms of isobenzofuranones and sulfonephthaleins. Therefore, the further quantitative analysis was performed for methyl-orange and methyl-red. Computational chemical analysis can be used for quantitative explanation of indicator dissociation mechanisms.
Quantitative Evaluation of Dissociation Mechanisms in Methylorange and MethylredAI Publications
Several computational chemistry programs were evaluated as aids to teaching a part of qualitative analytical chemistry. Computational chemical calculations can predict absorption spectra, thus enabling the modeling of indicator dissociation mechanisms using a personal computer. An updated MNDO program among 51 programs was previously found to be the best predictor to explain the dissociation mechanisms of isobenzofuranones and sulfonephthaleins. Therefore, the further quantitative analysis was performed for methyl-orange and methyl-red. Computational chemical analysis can be used for quantitative explanation of indicator dissociation mechanisms.
Where and why are the lucky primes positioned in the spectrum of the Polignac...Chris De Corte
The goal of this document is to share with the mathematical community the test results of my twin counting formula. The focus of the test is twofold: first we test if the formula seems to be valid for different offsets in twin primes. Second, we try to understand if we can derive some properties in the results.
The Correlation of Statistical Image and Partial Discharge Pulse Count of LDP...TELKOMNIKA JOURNAL
High voltage insulation must be designed in such a way that it is very resistant to ageing including
that from partial discharge (PD). Many studies were previously carried out on composites based on low
density polyethylene (LDPE). However, the use of natural rubber (NR) and nanosilica (SiO2) in the LDPENR
based composites is relatively new. Furthermore, the PD resistant performance of the composites is
yet to be extensively researched. This work aims to analyze the correlation between PD pulse count and
its related image to interpreting the effect of PD signals. The results show there is a strong correlation
between PD pulse count and the statistical image. The results indicate that the surface image statistical
analysis can be used as a tool to justify the total of the PD pulse count on the surface for different samples
of composite.
The two main challenges of predicting the wind speed depend on various atmospheric factors and random variables. This paper explores the possibility of developing a wind speed prediction model using different Artificial Neural Networks (ANNs) and Categorical Regression empirical model which could be used to estimate the wind speed in Coimbatore, Tamil Nadu, India using SPSS software. The proposed Neural Network models are tested on real time wind data and enhanced with statistical capabilities. The objective is to predict accurate wind speed and to perform better in terms of minimization of errors using Multi Layer Perception Neural Network (MLPNN), Radial Basis Function Neural Network (RBFNN) and Categorical Regression (CATREG). Results from the paper have shown good agreement between the estimated and measured values of wind speed.
Sr[Mg3SiN4]Eu2+ phosphor: solution for enhancing the optical properties of th...TELKOMNIKA JOURNAL
In the last decade, light-emitting diodes (LEDs), which based on spontaneous light emission in semiconductors can be considered as the main light sources for civil and industrial purposes. In this paper, we presented and investigated the effect of the Sr[Mg3SiN4]Eu2+ concentration on the optical properties of the 5600K remote-packaging WLEDs (RP-WLEDs). We use the Mat Lab and the LightTool software to investigate the effect of the Sr[Mg3SiN4]Eu2+ concentration on the CRI, CQS, D-CCT and LO of the 5600K RP-WLEDs. From the result, we can state that the concentration of the Sr[Mg3SiN4]Eu2+ influenced on the CRI, CQS, D-CCT and LO of the RP-WLEDs. The red Sr[Mg3SiN4]Eu2+ phosphor can be considered as the novel recommendation for LEDs industry.
Air Quality Prediction using Seaborn and TensorFlowijtsrd
Air quality is considered as a vital issue in the current world and is the underlying driver of sicknesses identified with respiratory organ, skin malignant growth, corrosive downpour and a worldwide temperature alteration. Anticipating air quality has been the consistent test with the developing industrialization, vehicles out and about, deforestation and different variables. Air contamination has been the issue of the entire world. In this paper, we propose to foresee the air nature of a specific spot, with the information gathered in past and take preventive measure to stop the disaster. We will utilize Spearmans Correlation as information used to foresee air quality is non straight and monotonic. Spearmans Correlation coefficient rs can invigorate us of the connection between highlights of information. Rahul Kumar Sharma | Kuldeep Baban Vayadande | Rahul Ranjan "Air Quality Prediction using Seaborn and TensorFlow" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd37975.pdf Paper URL : https://www.ijtsrd.com/computer-science/other/37975/air-quality-prediction-using-seaborn-and-tensorflow/rahul-kumar-sharma
Comparative analysis of multiple classification models to improve PM10 predic...IJECEIAES
With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.
In this research, we propose a novel recommendation for improving CCT-D and lumen output (LO) of the 6600K in-cup packaging white LEDs (ICP-WLEDs) by varying its particle concentration. By using Light Tools and Mat lab software based on the Mie Theory, we derive the influence of the red phosphor particle’s concentration on the D-CCT and LO. The results show that the CCT-D are significantly affected when the concentration of the red phosphor varying from 0% to 1.8%. The CCT-D decreases from 4000K to 2200K and LO increases from 800 lm to 1300 lm.
Quantitative Evaluation of Dissociation Mechanisms in Methylorange and MethylredAI Publications
Several computational chemistry programs were evaluated as aids to teaching a part of qualitative analytical chemistry. Computational chemical calculations can predict absorption spectra, thus enabling the modeling of indicator dissociation mechanisms using a personal computer. An updated MNDO program among 51 programs was previously found to be the best predictor to explain the dissociation mechanisms of isobenzofuranones and sulfonephthaleins. Therefore, the further quantitative analysis was performed for methyl-orange and methyl-red. Computational chemical analysis can be used for quantitative explanation of indicator dissociation mechanisms.
Quantitative Evaluation of Dissociation Mechanisms in Methylorange and MethylredAI Publications
Several computational chemistry programs were evaluated as aids to teaching a part of qualitative analytical chemistry. Computational chemical calculations can predict absorption spectra, thus enabling the modeling of indicator dissociation mechanisms using a personal computer. An updated MNDO program among 51 programs was previously found to be the best predictor to explain the dissociation mechanisms of isobenzofuranones and sulfonephthaleins. Therefore, the further quantitative analysis was performed for methyl-orange and methyl-red. Computational chemical analysis can be used for quantitative explanation of indicator dissociation mechanisms.
Where and why are the lucky primes positioned in the spectrum of the Polignac...Chris De Corte
The goal of this document is to share with the mathematical community the test results of my twin counting formula. The focus of the test is twofold: first we test if the formula seems to be valid for different offsets in twin primes. Second, we try to understand if we can derive some properties in the results.
The Correlation of Statistical Image and Partial Discharge Pulse Count of LDP...TELKOMNIKA JOURNAL
High voltage insulation must be designed in such a way that it is very resistant to ageing including
that from partial discharge (PD). Many studies were previously carried out on composites based on low
density polyethylene (LDPE). However, the use of natural rubber (NR) and nanosilica (SiO2) in the LDPENR
based composites is relatively new. Furthermore, the PD resistant performance of the composites is
yet to be extensively researched. This work aims to analyze the correlation between PD pulse count and
its related image to interpreting the effect of PD signals. The results show there is a strong correlation
between PD pulse count and the statistical image. The results indicate that the surface image statistical
analysis can be used as a tool to justify the total of the PD pulse count on the surface for different samples
of composite.
The two main challenges of predicting the wind speed depend on various atmospheric factors and random variables. This paper explores the possibility of developing a wind speed prediction model using different Artificial Neural Networks (ANNs) and Categorical Regression empirical model which could be used to estimate the wind speed in Coimbatore, Tamil Nadu, India using SPSS software. The proposed Neural Network models are tested on real time wind data and enhanced with statistical capabilities. The objective is to predict accurate wind speed and to perform better in terms of minimization of errors using Multi Layer Perception Neural Network (MLPNN), Radial Basis Function Neural Network (RBFNN) and Categorical Regression (CATREG). Results from the paper have shown good agreement between the estimated and measured values of wind speed.
Implemented various classification models using R language to identify which one performs best for prediction of soil fertility and which properties are important in defining the fertility of soil.
This research aim to forecast solar radiation,how much of electricity can be produced in next four months in two cities of India and performance evaluation of forecasting models. These models have been used for long-term forecasting of solar radiation using time series data.Forecasting models like ARIMA,TBATS have been used for this research.Forecasted solar radiation is further used for forecasting solar electricity generation.Performance evaluation of forecasting models has also been done.
Project on nypd accident analysis using hadoop environmentSiddharth Chaudhary
For this project NYC motor-vehicle-collisions dataset is processed in Hadoop ecosystem using map reduce, Pig script and Hive query for analysis and visualization.
Made a Visualisation project Report by using R packages(ggplot) on the Global terrorism dataset(1970-2015) using different interactive graphs, different combination of colours had been used so that colour blind people can also visualise the patterns.
Implemented Data warehouse on “Retail Stores of five states of USA” by using 3 different data sources including structured and unstructured using SSIS, SSAS and Power BI.
Implemented salesforce and CRM application, in this application employees and customers are sharing same platform which increases productivity and saves time for customers.
Developed a home security system to protect occupants from fire and intrusion. The device sends SMS to the emergency number provided to it via GSM (Global System for Mobile communications) module. Led my group and implemented the device successfully.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Table of Contents
MULTIPLE REGRESSION ANALYSIS........................................................................................................2
DATA SOURCE ..................................................................................................................................2
OBJECTIVE........................................................................................................................................2
DATA INFORMATION........................................................................................................................2
DATA CLEAN UP................................................................................................................................3
SOFTWARE.......................................................................................................................................3
ANALYSIS..........................................................................................................................................3
DATA SUMMARY .........................................................................................................................3
CORRELATION MATRIX................................................................................................................4
MULTIPLE REGRESSION ANALYSIS...............................................................................................5
RESIDUAL PLOT ...........................................................................................................................6
Model Summary .........................................................................................................................6
ANNOVA ...............................................................................................................................................7
OBJECTIVE........................................................................................................................................7
DATA INFORMATION........................................................................................................................7
SOFTWARE.......................................................................................................................................7
ANALYSIS.........................................................................................................................................8
DESCRIPTIVE STATISTICS .............................................................................................................8
LEVENE’S TEST ............................................................................................................................8
INTERATION EFFECT....................................................................................................................9
POST-HOC TEST.........................................................................................................................10
PLOT..........................................................................................................................................11
RESULT ......................................................................................................................................11
REFERENCES..............................................................................................................................12
3. MULTIPLE REGRESSION ANALYSIS
DATA SOURCE
This analysis has been done on air quality data of Dublin City. The data source is as follows.
https://data.gov.ie/dataset/air-quality-monitoring-data-dublin-city.
The data was present in four different excel.
1) Dublin city council PM10 and PM2.5 2011.csv.
2) Dublin city council NO and NO2 2011.csv
3) Dublin city council SO2 2011.csv
4) Dublin city council CO 2011.csv
OBJECTIVE
The reason of choosing this data is because the pollution is increasing in all metro cities of world. In
some cities like Beijing and Delhi the air quality is so bad that environment have become like gas
chambers.
The Objective of this analysis is to
1) Study the various components of air quality
2) Study the impact of other factors on PM2.5 and PM10
3) To understanding the relationship between all of them.
DATA INFORMATION
This dataset provides the information about various components responsible for air pollution.
● Nitrogen di oxide (NO2),
● Nitrogen Oxide (NO),
● Sulphur di Oxide (SO2),
● Carbon mono oxide (CO)
● PM 2.5
● PM 10
The major component of air are Nitrogen, Oxygen and Water Vapour covering 98% of air content.
Rest of the gases are present in small quantity which vary according to the quality of air. The major
one responsible for degrading the quality of air are Carbon mono oxide, Nitrogen di oxide, Ozone,
Sulphur di oxide and Particles. Particles are also known as particulate matter or PM. It consists of
smoke, dirt, soot, dust etc. These particles are classified according to their size. Example PM 10
means particles whose size is between 10 µm and 2.5 µm. PM 2.5 means particles smaller than 2.5
µm.
In this dataset we have collected the air pollutant information in the region of Dublin for the year
2011.
4. Data Type Granularity Converted
Nitrogen di oxide Hourly basis reading Daily average
Nitrogen Oxide Hourly basis reading Daily average
Sulphur di Oxide Hourly basis reading Daily average
Carbon mono oxide Hourly basis reading Daily average
PM2.5 Daily average none
PM 10 Daily average None
DATA CLEAN UP
The dataset was present in 5 different excel. So following clean up steps were taken.
1. Daily average were calculated by adding 24 reading of one day and dividing it by 24 for nitrogen
di oxide, Nitrogen oxide, Sulphur di oxide, Carbon mono oxide.
2. PM 2.5 and PM 10 were present in daily average format so no changes were done.
3. After consolidating this data one csv file was prepared.
SOFTWARE
R is used for this data analysis and it is very convenient tool for analysis and graph generation.
Data was loaded into R with the help of read
table command as follows.
air<-read.table("/home/hadoop/air_ireland.csv", sep=",",header=T)
ANALYSIS[1]
DATA SUMMARY
Below table represent the summary of the data in terms of max, min, median, 1st Quartile, 3 rd
Quartile. PM 2.5 and PM 10 are measured in g/m3. NO2, SO2, CO and NO are measured in ug/m3.
6. The above fig displays the histogram of all variables, scatterplot of each pair and correlation
coefficient of each pair along with the p value significance.
AS we can see from the graph following pairs have strong relationship.
1. NO2 and NO
2. CO and NO
3. CO AND NO2
4 SO2 AND NO
3. PM 2.5 and PM 10
All these pairs are positively correlated to each other and coefficeint value is greater than .5. Rest of
the values are either very less or statistically not important as shows less significant p-value.
Multiple Regression Analysis
In this data we will perform multiple regression to identify the relationship between PM2.5/PM10
and NO, SO2 and CO.
Since NO2 and NO shows very strong relationship hence we choose only one of them. In this
experiment we analysed three models.
Regression 1 : lm(PM10~NO+SO2+CO-1, data=my_data)
Regression 2: lm(PM2.5~NO+SO2+CO-1, data=my_data)
Regression 3: lm(PM10~PM25+NO+SO2-1, data=my_data)
Model R2 P-value Residual error
PM25 ~ NO + SO2 + CO-1 27.96 *** 10.24
PM10 ~ NO + SO2 + CO-1 36.09 *** 14.35
PM10~PM25+CO+NO+SO2-1 85.12 *** 6.934
7. Statistical details of Regression 3 : PM10~PM2.5+CO+NO+SO2
Residuals:
Min 1Q Median 3Q Max
-49.923 -1.135 2.295 4.685 33.689
Coefficients:
Estimate Std. Error t value Pr(>|t|)
PM25 1.22785 0.03560 34.494 < 2e-16 ***
CO 18.93690 4.53754 4.173 3.76e-05 ***
NO -0.02241 0.01101 -2.035 0.0426 *
SO2 1.94215 0.47681 4.073 5.70e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.934 on 361 degrees of freedom
Multiple R-squared: 0.8512, Adjusted R-squared: 0.8496
F-statistic: 516.4 on 4 and 361 DF, p-value: < 2.2e-16
As we can see from detailed statistics of regression the p-value of all coefficients except NO is very
significant. So CO , SO2 and PM2.5 impact is more significant in comparison to NO. The value of R
square is 85.12 which explains the 85 percent of variation in PM10 particle is explained by this
model.
RESIDUAL PLOT
In the residual plot we see it to be randomly scattered for values less than 20 but for greater than
20 we can see a positive pattern. So there are other factors which need to be captured to model
the variation.
8. Model Summary
We were trying to see the relationship between different variables as to establish the impact factor.
However in this data we did not find much relationship between PM2.5 and other chemical
properties and similarly PM10 and rest of chemical properties.
There is a strong relationship between PM10 and PM2.5. PM10 are the particles generated due to
smoke and dust. As the bigger particles rise, it becomes a reason of growth of PM2.5 which is quite
clear from the model. The smoke coming out of cars or factories consists of nitrogen and carbon
oxides and black carbon particles. They combine with air and make other compounds of Nitrogen
and oxygen. So as the smoke increases quantity of PM2.5 increases drastically.
Since Dublin is much less polluted in comparison to asian cities where PM2.5 has crossed the
bearable limit this effect is less visible.
9. ANNOVA
In two-way analysis of variance, we need two categorical independent variables and one dependent
variable. Through two-way ANOVA we look at the individual and joint effect of two independent
variables on one dependent variable.
Data Source: Data Link: http://www.europeansocialsurvey.org/downloadwizard/?loggedin
OBJECTIVE
The data set is based on level of belief in their religion in different age bands of different gender in
Europe Union. The objectives of the test are:
• To find the different age band has different level of believe in their religion both in male and
female
• Gender differences of dedication toward their religions.
DATA VARIABLES
The independent variables gender is recoded as males = 1 and females = 2.
The age bands are recoded as:
Band 1: <= 37 yrs;
Band 2: 38-56 yrs;
Band3: >=58 yrs.
The dependent variable is Dedication toward religion which ranges: 5-35.
MEASUREMENTS
For measurement, there are two categorical independent variables (Gender and age band). The age
band has three bands. The level of Dedication toward religion is assigned in range from 5– 35. Dif-
ferent tests like Levene's test of equality, homogeneity tests and post hoc tests are performed.
SOFTWARE
For this analysis SPSS has been used.[2]
10. Output from two-way ANOVA
Descriptive statistics
It explains the mean, standard deviation and records for each group.It shows number of male and
female in all age group. There is not much difference between the std.deviation of the age
group,they are almost similar.The mean of age group for (<=37) is 22.28,mean of age group (38-56)
is 22.24 and mean of age group (57+) is 22.62.
Descriptive Statistics
Dependent Variable: Dedication toward religion
Age Group 3(Binned) Gender Mean Std. Deviation N
<= 37 Male 20.40 6.904 73
Female 24.23 6.483 71
Total 22.28 6.947 144
38 - 56 Male 22.27 6.852 62
Female 22.21 6.566 86
Total 22.24 6.664 148
57+ Male 22.88 6.959 69
Female 22.37 6.565 75
Total 22.62 6.738 144
Total Male 21.81 6.958 204
Female 22.88 6.574 232
Total 22.38 6.770 436
11. Levene's test of equality
From the Levene’s test table we can see that the significance value is .476 which is greater than 0.05
This state that there is no violation of homogeneity of variance assumption.
Levene's Test of Equality of Error Variancesa
Dependent Variable: Dedication toward religion
F df1 df2 Sig.
.161 5 430 .476
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a. Design: Intercept + agegrp3 + gndr + agegrp3 * gndr
Interaction effect
To check interaction effect i.e to find that different age group has different level of dedication
towards religion found in male and female.For interaction effect significant value should be less
than 0.05.This table indicate that significant value of agegrp3*gndr(gender) is .011.there is a sig-
nificant difference in the effect of age in male and female for dedication toward religion.
Tests of Between-Subjects Effects
Dependent Variable: Dedication toward religion
Source
Type III Sum of
Squares df Mean Square F Sig.
Partial Eta
Squared
Corrected Model 549.493a 5 109.899 2.438 .034 .028
Intercept 216557.285 1 216557.285 4803.679 .000 .918
agegrp3 12.243 2 6.122 .136 .873 .001
gndr 126.893 1 126.893 2.815 .094 .007
agegrp3 * gndr 409.977 2 204.989 4.547 .011 .021
Error 19385.064 430 45.082
Total 238281.000 436
Corrected Total 19934.557 435
a. R Squared = .028 (Adjusted R Squared = .016)
Main Effect
Main effect can be interpreted for independent variable.From the table of TEST OF BETWEEN_Sub-
ject Effect it can be seen that value of agegrp3(age band) is .873 which is greater than 0.05 and for
Gender(gndr) it is .094 which is also greater than 0.05.This indicate that there is no significant main
effect for both Gender and age group.This indicate that both gender and age group differ in term of
dedicated toward their religion.
12. Effect size
The effect size for age group and Gender in partial eta column is less than 0.05.This effect size is
significantly different.
Post-hoc test
As per post hoc test there is no significant effect in religious
belief of male and female.
In TUKEY(honestly significant difference) test it shows there is no significant difference in the age
group as all the significant value is greater than 0.05.
Multiple Comparisons
Dependent Variable: Dedication toward religion
Tukey HSD
(I) Age Group 3(Binned) (J) Age Group 3(Binned)
Mean Differ-
ence (I-J) Std. Error Sig.
95% Confidence Interval
Lower Bound Upper Bound
<= 37 38 - 56 .05 .786 .998 -1.80 1.90
57+ -.33 .791 .907 -2.19 1.53
38 - 56 <= 37 -.05 .786 .998 -1.90 1.80
57+ -.38 .786 .878 -2.23 1.47
57+ <= 37 .33 .791 .907 -1.53 2.19
38 - 56 .38 .786 .878 -1.47 2.23
Based on observed means.
The error term is Mean Square(Error) = 45.082.
13. Plots
It is quite clear from the plot that there is a huge difference between the belief of age group <37.As
female of this group have higher dedication toward their religion its around 24.5 and for male its
around 20.4.Next is the age group of 38-57 years.The belief of this age group is almost same as
shown in the plot.the next age group is of age above 57+ it shows slight difference between the
belief of male and female in this group.As in this group male shows slightly high dedication toward
their religion than female.
This plot also state that belief of male in religion increases as age increases.but belief in religion in
the age group of less than 37 is least of all age group either it is male or female while in case of
females religious belief till age of 37 is highest of all age group either it is of male or female.it
decreases drastically till the age of 56.After the age of 56 it increases slightly.
Result
A two way annova test has been performed on three different group of male and female of age
group less than 37, between 38 and 56 and greater than 57. The religious orientation of each person
is measured between 5 and 35. Then annova has been applied to perform a hypothesis testing
whether two means are significantly different from each other or not. From the interaction effect
we can see that there is no significant different between the religious orientation if only gender or
age group is considered. But when gender and age group are collectively taken then different of
mean is significant. This effect is more clear from the cumulative plot which clearly explains that
orientation of young age group is showing greater different in comparison to middle aged and older
group.
References:
1.Brett Lantz(2013) Machine learning with R.Second Edition.
2. Pallant J. (2016) SPSS survival Manual. 6th Ed. New York, McGraw Hill
Education.