Rule Optimization of Fuzzy Inference System Sugeno using Evolution Strategy f...IJECEIAES
The need for accurate load forecasts will increase in the future because of the dramatic changes occurring in the electricity consumption. Sugeno fuzzy inference system (FIS) can be used for short-term load forecasting. However, challenges in the electrical load forecasting are the data used the data trend. Therefore, it is difficult to develop appropriate fuzzy rules for Sugeno FIS. This paper proposes Evolution Strategy method to determine appropriate rules for Sugeno FIS that have minimum forecasting error. Root Mean Square Error (RMSE) is used to evaluate the goodness of the forecasting result. The numerical experiments show the effectiveness of the proposed optimized Sugeno FIS for several test-case problems. The optimized Sugeno FIS produce lower RMSE comparable to those achieved by other wellknown method in the literature.
This document describes Siddharth Chaudhary's MSc research project on forecasting solar electricity generation using time series models. The research aims to 1) forecast solar generation in Delhi and Jodhpur, India, 2) evaluate the performance of forecasting models, and 3) compare potential solar generation between the two cities. Four time series models - TBATS, ARIMA, simple exponential smoothing, and Holt's method - are applied to solar radiation data from each city and their accuracy is assessed.
Insights into the Efficiencies of On-Shore Wind Turbines: A Data-Centric Anal...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/insights-into-the-efficiencies-of-on-shore-wind-turbines-a-data-centric-analysis/
Literature on renewable energy alternative of wind turbines does not include a multidimensional benchmarking studythat can help investment decisions as well as design processes. This paper presents a data-centric analysis of commercial on-shore wind turbines and provides actionable insights through analytical benchmarking through Data Envelopment Analysis (DEA), visual data analysis, and statistical hypothesis testing. The paper also introduces a novel visualization approach for the understanding and the interpretation of reference sets, the set of efficient wind turbines that should be taken as benchmark by inefficient ones.
1. The document discusses the meaning, uses, functions, importance and limitations of statistics. It defines statistics as the collection, presentation, analysis and interpretation of numerical data.
2. Statistics has various uses across different fields such as policy planning, management, education, commerce and accounts. It helps present facts precisely and enables comparison, correlation, formulation and testing of hypotheses, and forecasting.
3. While statistics is important for planning, administration, economics and more, it also has limitations such as only studying aggregates, numerical data, and being an average. Statistics can also be misused if not used carefully by experts.
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...ijtsrd
Diabetes is predicted by classification technique. The data mining tool WEKA has been developed for implementing Support Vector Machine SVM classifier. Proposed work is framed with a specific end goal to improve the execution of models. For improving the classification accuracy Support Vector Machine is combined with Feature Selection and percentage Split. Trial results demonstrated a serious change over in the current Support Vector Machine classifier. This approach enhances the classification accuracy and reduces computational time. S. Jaya Mala "A Hybrid Apporach of Classification Techniques for Predicting Diabetes using Feature Selection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27991.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/27991/a-hybrid-apporach-of-classification-techniques-for-predicting-diabetes-using-feature-selection/s-jaya-mala
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...Lisiane Zanella
This document provides a protocol for data exploration to avoid common statistical problems when analyzing ecological data. It discusses exploring data for outliers, heterogeneity, collinearity, dependence, and other issues. The protocol aims to identify potential problems before statistical analysis to reduce type I and II errors and ensure robust conclusions. Data exploration is presented as an essential first step, taking up to 50% of analysis time. Graphical tools are emphasized over tests for exploring data visually and identifying issues to address. The document provides examples and discusses handling outliers and other problems when they arise.
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
Rule Optimization of Fuzzy Inference System Sugeno using Evolution Strategy f...IJECEIAES
The need for accurate load forecasts will increase in the future because of the dramatic changes occurring in the electricity consumption. Sugeno fuzzy inference system (FIS) can be used for short-term load forecasting. However, challenges in the electrical load forecasting are the data used the data trend. Therefore, it is difficult to develop appropriate fuzzy rules for Sugeno FIS. This paper proposes Evolution Strategy method to determine appropriate rules for Sugeno FIS that have minimum forecasting error. Root Mean Square Error (RMSE) is used to evaluate the goodness of the forecasting result. The numerical experiments show the effectiveness of the proposed optimized Sugeno FIS for several test-case problems. The optimized Sugeno FIS produce lower RMSE comparable to those achieved by other wellknown method in the literature.
This document describes Siddharth Chaudhary's MSc research project on forecasting solar electricity generation using time series models. The research aims to 1) forecast solar generation in Delhi and Jodhpur, India, 2) evaluate the performance of forecasting models, and 3) compare potential solar generation between the two cities. Four time series models - TBATS, ARIMA, simple exponential smoothing, and Holt's method - are applied to solar radiation data from each city and their accuracy is assessed.
Insights into the Efficiencies of On-Shore Wind Turbines: A Data-Centric Anal...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/insights-into-the-efficiencies-of-on-shore-wind-turbines-a-data-centric-analysis/
Literature on renewable energy alternative of wind turbines does not include a multidimensional benchmarking studythat can help investment decisions as well as design processes. This paper presents a data-centric analysis of commercial on-shore wind turbines and provides actionable insights through analytical benchmarking through Data Envelopment Analysis (DEA), visual data analysis, and statistical hypothesis testing. The paper also introduces a novel visualization approach for the understanding and the interpretation of reference sets, the set of efficient wind turbines that should be taken as benchmark by inefficient ones.
1. The document discusses the meaning, uses, functions, importance and limitations of statistics. It defines statistics as the collection, presentation, analysis and interpretation of numerical data.
2. Statistics has various uses across different fields such as policy planning, management, education, commerce and accounts. It helps present facts precisely and enables comparison, correlation, formulation and testing of hypotheses, and forecasting.
3. While statistics is important for planning, administration, economics and more, it also has limitations such as only studying aggregates, numerical data, and being an average. Statistics can also be misused if not used carefully by experts.
A Hybrid Apporach of Classification Techniques for Predicting Diabetes using ...ijtsrd
Diabetes is predicted by classification technique. The data mining tool WEKA has been developed for implementing Support Vector Machine SVM classifier. Proposed work is framed with a specific end goal to improve the execution of models. For improving the classification accuracy Support Vector Machine is combined with Feature Selection and percentage Split. Trial results demonstrated a serious change over in the current Support Vector Machine classifier. This approach enhances the classification accuracy and reduces computational time. S. Jaya Mala "A Hybrid Apporach of Classification Techniques for Predicting Diabetes using Feature Selection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27991.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/27991/a-hybrid-apporach-of-classification-techniques-for-predicting-diabetes-using-feature-selection/s-jaya-mala
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...Lisiane Zanella
This document provides a protocol for data exploration to avoid common statistical problems when analyzing ecological data. It discusses exploring data for outliers, heterogeneity, collinearity, dependence, and other issues. The protocol aims to identify potential problems before statistical analysis to reduce type I and II errors and ensure robust conclusions. Data exploration is presented as an essential first step, taking up to 50% of analysis time. Graphical tools are emphasized over tests for exploring data visually and identifying issues to address. The document provides examples and discusses handling outliers and other problems when they arise.
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
This document discusses methods of collecting statistical data. It describes census and sample investigation methods. The census method collects data from every unit of the population, while the sample method collects data from only a few representative units. The census method is more reliable but costly, while the sample method is less expensive but less accurate. Key differences between the two methods are also outlined.
Two main branches of statistics are described: descriptive statistics and inferential statistics. Descriptive statistics focuses on collecting, summarizing, and presenting data, while inferential statistics analyzes sample data to draw conclusions about the overall population. Statistics has many applications including actuarial science, biostatistics, business analytics, demography, econometrics, environmental statistics, epidemiology, geostatistics, operations research, population ecology, psychology, quality control, and various fields of physics.
Statistics is the collection and analysis of data. There are two main branches: descriptive statistics, which organizes and summarizes data, and inferential statistics, which uses descriptive statistics to make predictions. Statistics starts with a question and uses data to provide information to help make decisions. It is widely used in business, health, education, research, social sciences, and natural resources.
Top 10 Uses Of Statistics In Our Day to Day Life Stat Analytica
Don't you know the uses of statistics is our daily life? If yes then check out this presentation you will learn a lot more about the use of statistics in our daily life.
Comparison of Solar Radiation Intensity Forecasting Using ANFIS and Multiple ...journalBEEI
This document compares the performance of two methods for forecasting solar radiation intensity: Adaptive Neuro Fuzzy Inference System (ANFIS) and Multiple Linear Regression (MLR). It uses weather data from Basel, Switzerland to test the methods. The ANFIS method uses a fuzzy inference system combined with neural networks, while MLR uses a mathematical approach. The performance of both methods is evaluated using root mean square error (RMSE) and mean absolute error (MAE) across different training/testing data compositions and time periods. The results show that ANFIS consistently provides lower error values than MLR, indicating it provides more accurate solar radiation forecasts.
This document provides information about statistics including its definition, origins, uses in different fields, and key statistical concepts. It defines statistics as the mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Some key points:
- Statistics originated from needs to base policy on demographic and economic data and has broadened to include collecting and analyzing data in general.
- It is widely used today in government, business, and natural and social sciences to make accurate inferences from data and decisions in uncertainty.
- The document also defines and provides examples of important statistical concepts including the mean, mode, and median.
This document discusses an economic feasibility study of a solar power plant located in Elazığ, Turkey using a levelized cost analysis method. It finds that the payback period for investing in the solar power plant is calculated as 13 years, compared to an average of 6.6 years calculated by companies. The annual profit of a 1 MW solar energy plant is calculated as $89,467 USD. The present worth and annual capital cost of the solar power plant are calculated as $1,156,763 USD and $1,181,875 USD respectively. When considering Turkey's high interest rates, such rates will negatively impact investments in solar power plants.
The document analyzes the energy consumption for cucumber greenhouse production in Iran using data envelopment analysis. Data was collected from 20 greenhouses and energy inputs (like diesel, fertilizer, labor) and outputs (cucumber yield) were calculated. Total energy input was 163,994 MJ/ha with diesel fuel as the highest at 45.15%. Output was 62,496 MJ/ha. Technical, pure technical and scale efficiencies were then calculated using DEA to evaluate energy efficiency and identify areas for improvement. The study found DEA to be useful for benchmarking energy use and determining how to reduce waste.
Comparative analysis of multiple classification models to improve PM10 predic...IJECEIAES
With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.
This paper reviews recent literature on exploratory factor analysis (EFA) and assesses its current use in nursing research. The review finds that while EFA is commonly used, researchers often rely on outdated heuristics rather than evidence-based recommendations when making key decisions. An assessment of 54 EFA solutions in nursing journals found researchers commonly used participants-to-items ratios to determine sample sizes, used PCA instead of EFA, relied on eigenvalues >1 and scree tests to determine factors, used PCA for extraction and Varimax rotation. The paper recommends researchers draw on simulation studies to determine sample sizes and make informed choices aligned with the goals and models of EFA and PCA.
This document provides a lecture note on statistics for physical sciences and engineering. It begins with an introduction to statistics and its importance in various fields such as physical sciences, engineering, and research. It then discusses descriptive and inferential statistics. The document also covers topics such as data collection methods, presentation of data through tables and diagrams, and some basic statistical definitions. Examples are provided to illustrate how to construct frequency tables from raw data. In summary, the document presents an overview of key statistical concepts and methods relevant for physical sciences and engineering.
This meta-analysis examined the relationship between body mass index (BMI) and incident asthma. It identified 2006 relevant studies and included 12 prospective cohort studies. Inclusion criteria required adult subjects, asthma as the primary outcome, BMI measurement, minimum 1-year follow up of 70%, and BMI data categorized by standard ranges. Random effects models were used to generate summary odds ratios. Results showed overweight individuals had a 38% higher odds of developing asthma compared to normal weight, and obese individuals had 92% higher odds. When stratified by sex, the association was stronger for women. The analysis provided evidence that higher BMI is a risk factor for incident asthma.
This presents an overview about relevance and significance of statistics as a valid tool in enhancing quality of research. It also touches upon some misuse and abuse of statistics.
D. Mayo's slides "“The Statistics Wars and Intellectual Conflicts of Interest” for Special Session of the (remote) Phil Stat Forum: “Statistical Significance Test Anxiety” on 11 January 2022
An illustrated guide to the methods of meta analysirsd kol abundjani
This document provides an overview of meta-analysis methods. It begins by defining meta-analysis and its importance in health care evaluation. It then describes the basic principles of meta-analysis using an example on hospital readmission rates. Next, it discusses threats to meta-analysis validity and methods to address them. Finally, it outlines developing meta-analysis methods and directions for the future. The overall aim is to illustrate meta-analysis methods and highlight areas for further development.
Statistics are used widely in many areas of real life including weather forecasting, emergency preparedness, disease prediction, education, genetics, politics, quality testing, business, banking, insurance, government administration, astronomy, and the natural and social sciences. Some key examples provided include how weather models use statistics to predict future weather, emergency teams rely on statistics to prepare for danger, disease rates are calculated using statistics, teachers evaluate students' performance statistically, and businesses use statistics to plan production and marketing.
This document summarizes a study on promoting the use of mindful safety practices (MSPs) among employees in the Norwegian petroleum industry. The study was based on questionnaire responses from over 2900 employees across two time periods. It found that safety management practices aimed at increasing MSP use would be most effective if directed at employee work groups and their local work environments, rather than at individual employees. The study suggests focusing on promoting MSP use when employees transfer to new work environments or when existing work environments change. Concrete safety management practices like establishing work group norms supporting MSP use, providing MSP training, and discussing MSP use regularly could help increase employees' willingness to use MSPs.
This document provides a summary of a meta-analysis presented by Preethi Rai on November 12, 2013. It defines meta-analysis as a quantitative approach that systematically combines the results of previous research studies in order to arrive at conclusions about the body of research. The summary explains that meta-analysis increases the overall sample size and statistical power to better understand treatment effects. It also addresses how meta-analysis can help resolve controversies, identify areas needing more research, and generalize study results. Limitations including publication bias and inability to improve original study quality are also noted.
IRJET - Analysis of Crop Yield Prediction by using Machine Learning AlgorithmsIRJET Journal
This document analyzes crop yield prediction using machine learning algorithms like K-Nearest Neighbor and Support Vector Machine. It discusses collecting agricultural data from various regions on factors like rainfall, humidity, temperature, area, yield, soil type and location. The data is preprocessed, transformed and split into training and testing sets. Both KNN and SVM are applied to the data and SVM is found to have higher accuracy and faster execution time compared to KNN in predicting suitable crops and estimated yields. The proposed system provides farmers an efficient way to predict crops and yields for their region using modern machine learning techniques.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
This document discusses frameworks and indices for assessing sustainability. It begins by introducing common types of sustainability assessment tools, focusing on indicators and indices. It then outlines several widely-used sustainability frameworks, including the Triple Bottom Line framework and pressure-state-response model. Next, it describes the process for constructing sustainability indices, including selecting indicators, standardizing data, assigning weights, and aggregating the results. It notes that indicator selection and weighting are often inconsistent due to a lack of standardized requirements. Finally, it argues that sustainability frameworks can effectively guide indicator selection for both standalone indicators and composite indices.
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
This document discusses methods of collecting statistical data. It describes census and sample investigation methods. The census method collects data from every unit of the population, while the sample method collects data from only a few representative units. The census method is more reliable but costly, while the sample method is less expensive but less accurate. Key differences between the two methods are also outlined.
Two main branches of statistics are described: descriptive statistics and inferential statistics. Descriptive statistics focuses on collecting, summarizing, and presenting data, while inferential statistics analyzes sample data to draw conclusions about the overall population. Statistics has many applications including actuarial science, biostatistics, business analytics, demography, econometrics, environmental statistics, epidemiology, geostatistics, operations research, population ecology, psychology, quality control, and various fields of physics.
Statistics is the collection and analysis of data. There are two main branches: descriptive statistics, which organizes and summarizes data, and inferential statistics, which uses descriptive statistics to make predictions. Statistics starts with a question and uses data to provide information to help make decisions. It is widely used in business, health, education, research, social sciences, and natural resources.
Top 10 Uses Of Statistics In Our Day to Day Life Stat Analytica
Don't you know the uses of statistics is our daily life? If yes then check out this presentation you will learn a lot more about the use of statistics in our daily life.
Comparison of Solar Radiation Intensity Forecasting Using ANFIS and Multiple ...journalBEEI
This document compares the performance of two methods for forecasting solar radiation intensity: Adaptive Neuro Fuzzy Inference System (ANFIS) and Multiple Linear Regression (MLR). It uses weather data from Basel, Switzerland to test the methods. The ANFIS method uses a fuzzy inference system combined with neural networks, while MLR uses a mathematical approach. The performance of both methods is evaluated using root mean square error (RMSE) and mean absolute error (MAE) across different training/testing data compositions and time periods. The results show that ANFIS consistently provides lower error values than MLR, indicating it provides more accurate solar radiation forecasts.
This document provides information about statistics including its definition, origins, uses in different fields, and key statistical concepts. It defines statistics as the mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Some key points:
- Statistics originated from needs to base policy on demographic and economic data and has broadened to include collecting and analyzing data in general.
- It is widely used today in government, business, and natural and social sciences to make accurate inferences from data and decisions in uncertainty.
- The document also defines and provides examples of important statistical concepts including the mean, mode, and median.
This document discusses an economic feasibility study of a solar power plant located in Elazığ, Turkey using a levelized cost analysis method. It finds that the payback period for investing in the solar power plant is calculated as 13 years, compared to an average of 6.6 years calculated by companies. The annual profit of a 1 MW solar energy plant is calculated as $89,467 USD. The present worth and annual capital cost of the solar power plant are calculated as $1,156,763 USD and $1,181,875 USD respectively. When considering Turkey's high interest rates, such rates will negatively impact investments in solar power plants.
The document analyzes the energy consumption for cucumber greenhouse production in Iran using data envelopment analysis. Data was collected from 20 greenhouses and energy inputs (like diesel, fertilizer, labor) and outputs (cucumber yield) were calculated. Total energy input was 163,994 MJ/ha with diesel fuel as the highest at 45.15%. Output was 62,496 MJ/ha. Technical, pure technical and scale efficiencies were then calculated using DEA to evaluate energy efficiency and identify areas for improvement. The study found DEA to be useful for benchmarking energy use and determining how to reduce waste.
Comparative analysis of multiple classification models to improve PM10 predic...IJECEIAES
With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.
This paper reviews recent literature on exploratory factor analysis (EFA) and assesses its current use in nursing research. The review finds that while EFA is commonly used, researchers often rely on outdated heuristics rather than evidence-based recommendations when making key decisions. An assessment of 54 EFA solutions in nursing journals found researchers commonly used participants-to-items ratios to determine sample sizes, used PCA instead of EFA, relied on eigenvalues >1 and scree tests to determine factors, used PCA for extraction and Varimax rotation. The paper recommends researchers draw on simulation studies to determine sample sizes and make informed choices aligned with the goals and models of EFA and PCA.
This document provides a lecture note on statistics for physical sciences and engineering. It begins with an introduction to statistics and its importance in various fields such as physical sciences, engineering, and research. It then discusses descriptive and inferential statistics. The document also covers topics such as data collection methods, presentation of data through tables and diagrams, and some basic statistical definitions. Examples are provided to illustrate how to construct frequency tables from raw data. In summary, the document presents an overview of key statistical concepts and methods relevant for physical sciences and engineering.
This meta-analysis examined the relationship between body mass index (BMI) and incident asthma. It identified 2006 relevant studies and included 12 prospective cohort studies. Inclusion criteria required adult subjects, asthma as the primary outcome, BMI measurement, minimum 1-year follow up of 70%, and BMI data categorized by standard ranges. Random effects models were used to generate summary odds ratios. Results showed overweight individuals had a 38% higher odds of developing asthma compared to normal weight, and obese individuals had 92% higher odds. When stratified by sex, the association was stronger for women. The analysis provided evidence that higher BMI is a risk factor for incident asthma.
This presents an overview about relevance and significance of statistics as a valid tool in enhancing quality of research. It also touches upon some misuse and abuse of statistics.
D. Mayo's slides "“The Statistics Wars and Intellectual Conflicts of Interest” for Special Session of the (remote) Phil Stat Forum: “Statistical Significance Test Anxiety” on 11 January 2022
An illustrated guide to the methods of meta analysirsd kol abundjani
This document provides an overview of meta-analysis methods. It begins by defining meta-analysis and its importance in health care evaluation. It then describes the basic principles of meta-analysis using an example on hospital readmission rates. Next, it discusses threats to meta-analysis validity and methods to address them. Finally, it outlines developing meta-analysis methods and directions for the future. The overall aim is to illustrate meta-analysis methods and highlight areas for further development.
Statistics are used widely in many areas of real life including weather forecasting, emergency preparedness, disease prediction, education, genetics, politics, quality testing, business, banking, insurance, government administration, astronomy, and the natural and social sciences. Some key examples provided include how weather models use statistics to predict future weather, emergency teams rely on statistics to prepare for danger, disease rates are calculated using statistics, teachers evaluate students' performance statistically, and businesses use statistics to plan production and marketing.
This document summarizes a study on promoting the use of mindful safety practices (MSPs) among employees in the Norwegian petroleum industry. The study was based on questionnaire responses from over 2900 employees across two time periods. It found that safety management practices aimed at increasing MSP use would be most effective if directed at employee work groups and their local work environments, rather than at individual employees. The study suggests focusing on promoting MSP use when employees transfer to new work environments or when existing work environments change. Concrete safety management practices like establishing work group norms supporting MSP use, providing MSP training, and discussing MSP use regularly could help increase employees' willingness to use MSPs.
This document provides a summary of a meta-analysis presented by Preethi Rai on November 12, 2013. It defines meta-analysis as a quantitative approach that systematically combines the results of previous research studies in order to arrive at conclusions about the body of research. The summary explains that meta-analysis increases the overall sample size and statistical power to better understand treatment effects. It also addresses how meta-analysis can help resolve controversies, identify areas needing more research, and generalize study results. Limitations including publication bias and inability to improve original study quality are also noted.
IRJET - Analysis of Crop Yield Prediction by using Machine Learning AlgorithmsIRJET Journal
This document analyzes crop yield prediction using machine learning algorithms like K-Nearest Neighbor and Support Vector Machine. It discusses collecting agricultural data from various regions on factors like rainfall, humidity, temperature, area, yield, soil type and location. The data is preprocessed, transformed and split into training and testing sets. Both KNN and SVM are applied to the data and SVM is found to have higher accuracy and faster execution time compared to KNN in predicting suitable crops and estimated yields. The proposed system provides farmers an efficient way to predict crops and yields for their region using modern machine learning techniques.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
This document discusses frameworks and indices for assessing sustainability. It begins by introducing common types of sustainability assessment tools, focusing on indicators and indices. It then outlines several widely-used sustainability frameworks, including the Triple Bottom Line framework and pressure-state-response model. Next, it describes the process for constructing sustainability indices, including selecting indicators, standardizing data, assigning weights, and aggregating the results. It notes that indicator selection and weighting are often inconsistent due to a lack of standardized requirements. Finally, it argues that sustainability frameworks can effectively guide indicator selection for both standalone indicators and composite indices.
IRJET- Air Quality Forecast Monitoring and it’s Impact on Brain Health based ...IRJET Journal
This document discusses the development of an air quality forecast monitoring system based on big data and the Internet of Things to monitor brain health quality. The system collects air quality data from sensors using IoT devices, classifies the data using Bayesian algorithms, develops a prediction model, and monitors brain health quality in real-time using distributed computing on big data. Experimental results show the gas quality prediction model is feasible for real-time predictive monitoring of air quality and its impact on brain health. Future work will improve the classification methods, optimize the system interface and data storage, and expand sensor coverage.
The document discusses 7 different papers related to solar radiation data. The papers cover topics like procedures for collecting and processing solar radiation data in Southern Portugal, reviewing applications and methodologies for obtaining solar radiation data, best practices for quality control of solar irradiance data, quality control procedures for solar radiation measurements, developing a web-based GIS system for solar radiation assessment, reviewing models for estimating solar radiation, and using graphical visualizations to interpret climate data.
Clustering and Classification in Support of Climatology to mine Weather Data ...MangaiK4
Abstract -Knowledge of climate data of region is essential for business, society, agriculture, pollution and energy applications. Climate is not fixed, the fluctuation in the climate can be seen from year to year.Thedata mining application help meteorological scientists to predictaccurate weather forecast and decisions and also provide more performance and reliability than any other methods. The data mining techniques applied on weather data are efficient when compare to the mathematical models used. Various techniques of data mining are applied on climate data to support weather forecasting, climate scientists, agriculture, vegetation, water resources and tourism. The aim of this paper is to provide a review report on various data mining techniques applied on weather data set in support of weather prediction and climate analysis
This document provides an overview of a seminar presentation on green accounting. It discusses the introduction, objectives, scope, and methodology of the research. The research aims to understand the concept and uses of green accounting in Bangladesh through a quantitative study analyzing secondary data sources from 2016-2020. It will use descriptive analysis and statistical tools like the h-index, CiteScore, SJR, and SNIP to analyze literature from databases such as Scopus, Emerald, EBSCO, and Science Direct. The findings will discuss frequently researched keywords, authors, and countries related to green accounting.
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET Journal
The document discusses using association rule mining and k-means clustering to identify risk factors for diabetes from electronic medical records. It reviews existing techniques for summarizing large sets of association rules generated from medical data and proposes using k-means clustering as an improved method. The k-means algorithm clusters patient data into groups based on similarity and identifies representative risk factor patterns within each cluster, providing a concise summary for clinicians to assess diabetes risk.
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
This document discusses limitations and applications of statistics. It begins by covering limitations of statistics, such as it only dealing with quantitative data and groups/aggregates, and possible errors in statistical analysis. It then covers many fields that statistics can be applied to, such as actuarial science, biostatistics, econometrics, environmental statistics, epidemiology, and others. It concludes with sample multiple choice questions related to limitations and applications of statistics.
Running head RESEARCH METHODOLOGY, DESIGN AND METHODS 1RESEARC.docxjeanettehully
This document summarizes a research study conducted on Sun Coast Health. The study aimed to evaluate various areas of concern for Sun Coast, including the relationship between particulate matter size and employee health, the effectiveness of safety training, predicting noise levels at job sites, comparing new and previous employee training programs, analyzing changes in lead exposure, and differences in return on investment for different services. The study used quantitative research methods and collected data from over 100 job sites and 1500 contracts. Various statistical analyses were used to analyze the data and test hypotheses related to each research question.
This document provides an overview of operational research (OR) and its application in health management. It defines OR as the scientific study of operations to improve decision making. The document outlines the main features of OR, including taking a total systems approach and using tools from various disciplines. It discusses several quantitative techniques used in OR, such as linear programming, simulation, and inventory control. The document explains how these techniques can help optimize resource allocation and improve efficiency in health systems.
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
This document discusses the importance of best practices in scientific computing. It notes that scientists rely heavily on software for research, with many writing their own code. However, most scientists are self-taught in software skills and may be unaware of best practices that could help them write more reliable and maintainable code. The document advocates treating software like a scientific instrument and following practices such as version control, testing, and automation. Adopting these practices could help reduce errors and make software easier to reuse.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
This document describes a study that develops a fuzzy inference system (FIS) to assess the sustainability of biomass production for energy purposes. The FIS uses four input parameters - energy output, energy balance ratio, fertilizer usage, and pesticide usage - with defined membership functions. Eighty-one IF-THEN rules were created relating the input parameters to a single output parameter, a fuzzy sustainability index (FSI). The FSI indicates the sustainability level as very low, low, medium, high or very high. The FIS provides a means to evaluate biomass sustainability that can handle uncertain input data, unlike other assessment methods. Graphs show the relationship between input parameters and the fuzzy output based on the rules.
This document discusses data mining algorithms for clustering healthcare data streams. It provides an overview of the K-means and D-stream algorithms, and proposes a framework for comparing them on healthcare datasets. The framework involves feature extraction from physiological signals, calculating risk components, and applying the K-means and D-stream algorithms to cluster the data. The results would show the effectiveness and limitations of each algorithm for clustering streaming healthcare data.
This document proposes a seven-phase framework for mapping global warming research based on relationships between nature and human society. The phases are: 1) socioeconomic activity and greenhouse gas emissions, 2) carbon cycle and carbon concentration, 3) climate change and global warming, 4) impacts on ecosystems and human society, 5) adaptation, 6) mitigation, and 7) social systems. The framework was developed to better understand current scientific knowledge on global warming issues and identify gaps. The Intergovernmental Panel on Climate Change Fourth Assessment Report findings were applied to the framework to analyze the quantity and reliability of research results in each phase. The mapping aims to provide a comprehensive view of global warming research and inform future research priorities.
Forecasting of electric consumption in a semiconductor plant using time serie...Alexander Decker
This document summarizes a study that used time series methods to forecast electricity consumption in a semiconductor plant. The study analyzed 36 months of historical electricity consumption data from 2010-2012 to select the best forecasting model. Single exponential smoothing was found to have the lowest Mean Absolute Percentage Error (MAPE) of 5.60% and was determined to be the best forecasting method. The selected model will be used to forecast future electricity consumption for the plant.
INFORMS is collaborating with JSTOR to digitize and provide access to past issues of the journal Management Science. This document presents a study that compares estimates of hospital production characteristics from two different estimation models: translog cost function estimation and data envelopment analysis (DEA). The study uses data from 114 North Carolina hospitals to estimate a production technology with four inputs and three outputs using both models. Key areas of comparison between the two models include estimates of returns to scale, marginal rates of output transformation, and technical efficiency. The results provide insights into the relative strengths and weaknesses of the two approaches.
Data Science for Building Energy Management a reviewMigue.docxrandyburney60861
Data Science for Building Energy Management: a review
Miguel Molina-Solanaa,b, Maŕıa Rosa,∗, M. Dolores Ruiza, Juan Gómez-Romeroa, M.J. Martin-Bautistaa
aDepartment of Computer Science and Artificial Intelligence, Universidad de Granada
bData Science Institute, Imperial College London
Abstract
The energy consumption of residential and commercial buildings has risen steadily in recent years, an
increase largely due to their HVAC systems. Expected energy loads, transportation, and storage as well
as user behavior influence the quantity and quality of the energy consumed daily in buildings. However,
technology is now available that can accurately monitor, collect, and store the huge amount of data involved
in this process. Furthermore, this technology is capable of analyzing and exploiting such data in meaningful
ways. Not surprisingly, the use of data science techniques to increase energy efficiency is currently attracting
a great deal of attention and interest. This paper reviews how Data Science has been applied to address the
most difficult problems faced by practitioners in the field of Energy Management, especially in the building
sector. The work also discusses the challenges and opportunities that will arise with the advent of fully
connected devices and new computational technologies.
1. Introduction
There is a general consensus in the world today that human activities are having a negative impact
on the environment and have accelerated both global warming and climate change. These environmental
threats have been intensified by the emissions produced by the energy required for the lighting and HVAC
(heating, ventilation and air-conditioning) systems in building constructions. According to the International
Energy Agency (IEA), residential and commercial buildings are responsible for up to 32% of the total final
energy consumption. In fact, in most IEA countries, they account for approximately 40% of the primary
energy consumption. Similar statistics are given by the World Business Council for Sustainable Development
(WBCSD) within the framework of its Energy Efficiency in Buildings (EEB) project1. Also provided is a
comprehensive review [1] of the state of the art in building energy use (with a primary focus on energy
demand).
These data indicate that inefficient energy management in aging buildings combined with rising construc-
tion activity in developed countries will cause energy consumption to soar in the near future and heighten the
negative impacts associated with this consumption. Moreover, variable energy costs call for the implemen-
tation of more intelligent strategies to adapt and reduce energy consumption as well as to find alternative
and sustainable energy sources. The relevance of these issues is clearly reflected in the research priorities of
the European Union, as stated in its Horizon2020 Societal Challenge “Secure, Clean and Efficient Energy”.
This work program targets a significant reduction in energy consu.
This document compares public-private partnership (PPP) contracts and cost-plus contracts for infrastructure projects. It summarizes the key risks allocated to the concessionaire and granting authority for a PPP road project case study. For the PPP contract, risks like land acquisition delays, operation and maintenance failures, and traffic revenue shortfalls are allocated to the concessionaire, while risks like changes of scope, competing roads, and political risks are allocated to the granting authority. For a cost-plus contract case study, risks related to natural disasters, design defects, and late payments are borne by the granting authority, while delays in payment are the concessionaire's responsibility. The document analyzes costs, revenues and risks in detail to compare the
This document proposes a new classification and recognition algorithm for high-resolution remote sensing images of Chinese ancient villages. The algorithm is based on ensemble learning and uses multi-scale multi-feature segmentation to extract spectral and texture features from images. These features are then used as inputs to multiple SVM classifiers trained with AdaBoost. The classifiers are combined using majority voting to produce the final classification. Experiments showed the proposed algorithm performed better than traditional methods at classifying elements in remote sensing images of ancient villages.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
1. International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
133 www.erpublication.org
Abstract— Global warming is an important issues all over the
world those posses several effects on the environment. Several
factors are responsible for global warming. One of the main
issues is the release of carbon dioxide. In this paper, focussing on
the aftermath factors (variables) on environment due to global
warming. For this, classification and prediction technique is
being used to classify the factors of global warming and then
predict on future years in the atmosphere, and thereby affecting
the environment.
Index Terms— Classification algorithms, Data mining , Global
Warming, Prediction algorithms.
I. INTRODUCTION
Data mining has attracted lot of attention in the research
industry and in society as a whole in recent years, due to
enormous availability of large amount of data and the need for
turning such data into useful information and knowledge. The
objective of this paper is to analyze such data and to resolve
environmental research issues.
Global Warming is an issue that keeps coming up recently
with the increase of temperature and carbon dioxide level.
Scientist believes that the main cause of this is because of the
deforestation, pollution, carbon emission from transportation
and factories that led to this global warming and climate
change are terms for the observed century-scale rise in the
average temperature of the Earth’s climate system and its
related effects.
Factors of global warming
Greenhouse gases
Variations in earth's orbit
Deforestation
Burning fossil fuels
Prediction technique has been a prior one technique to
immolate the pattern of global warming. There are several
factors of global warming, but out of them only highly
potential factors are considered in this paper. Data sets on
these factors have been formulated in such a way that the
impact of each and everyone can beagglomerated together to
predict the effects of global warming in future. Algorithms
such as regression (linear regression, multi-linear regression,
and non-linear regression), classification, and density
estimate have been used for prediction. Using these
algorithms, comparisons will be done to summarise the
aftermath effect of these factors on the environment.
Ms. Nisha Bairagee , M.Tech(CSE) Scholar, JSS Academy of Technical
Education, Noida, Uttar Pradesh, India, 9810552779.
Mrs. Nitima Malsa, Assistant Professor, Department of Computer
Science and Engineering, JSS Academy of Technical Education, Noida,
Uttar Pradesh, India.
Dr. Jyoti Gautam, Head of the Department of Computer Science and
Engineering, JSS Academy of Technical Education, Noida, Uttar Pradesh,
India.
II. LITERATURE REVIEW
Data mining, also called Knowledge Discovery in Databases
(KDD), is the field of discovering novel and potentially useful
information from large amounts of data. The idea behind this
paper is educational data mining which is still in its infancy. In
case of global warming, we studied several papers from which
we came to certain results that are:
P. Kaur, M. Singh, G S Josan applied CHAID prediction
model to analyze the interrelation between variables that are
used to predict the slow learner in school education. The
CHAID prediction model of student performance was
constructed with seven class predictor variable. [13]
K KAKU show that approximation of baseline of GHG
emissions and reduction on poultry and swine industries of
ASEAN 8 countries by adoption of GHG reduction scenario
as waste management system instead of conventional system
on GHG reduction; the fluctuation of current benchmark price
of GHG and show that the stable economic benefit could not
be expected; and to show economic benefits that broiler and
swine industry in ASEAN 8 countries as developing countries
could expect. [12]
T-S Kwon, C M Lee, S-S Kim describe Prediction of
abundance of beetles: In this study, a simple change in
temperature will affect the abundance of beetles; they applied
Quantitative prediction of abundance on the basis of
temperature change; Statistical analysis is used on data set.
[18]
T-S Kwon, C M Lee, J Park, S-S Kim, J H Chun ,J H Sung
describe Prediction of abundance of ants in this study
included a simple change in temperature and didn't consider
competition between species. When the range of temperature
in the existing statistical methods was estimated, it is different
from the result obtained in this study. [19]
T-S Kwon, C M Lee, J Park, S-S Kim, J H Chun, J H Sung
describes Prediction of abundance of spiders: They applied
Quantitative prediction of abundance on the basis of
temperature change; Take more than one species of spider
distributed into three categories- increase, no change,
decrease.[17]
P C Austin, E W Steyerbery provide a method to determine
the number of independent variables that can be included in a
linear regression model and focused on accurate estimation of
regression coefficients, standard errors, and confidence
intervals. In contrast, linear regression models require only
two SPV for adequate estimation of regression coefficients,
standard errors, and confidence intervals. [14]
H Wang, X Lua, P Xua, D Yuan provide the concept of
CDHs/HDHs (cooling/heating degree hours) is introduced
and weekly prediction models of total building power
consumption are proposed by the way of multiple linear
regression algorithm which is relatively simple and easy to
understand. The prediction models are validated to have great
accuracy and general applicability in the paper, offering
reliable instructions to the building facility manager and
relevant competent authorities in terms of decision making
and policy implementation. [4]
Prediction on Global Warming
Ms. Nisha Bairagee, Mrs. Nitima Malsa, Dr. Jyoti Gautam
2. Prediction on Global Warming
134 www.erpublication.org
A M Freije, T Hussain, E A Salman provided an information
and increase awareness about three aspects of global warming
including causes, impacts, and solutions; Therefore, the study
has recommended integrating environmental concepts into
the university curriculum for all students irrespective of their
academic specialization in order to increase the
environmental awareness.[1].
Reviewing all these papers, one thing can be estimated that
the issue on global warming needs to be taken in a serious way
and methods or techniques has to be developed to know
global warming pattern better. This paper discuss on the
factors that mostly affect the environment in a hazardous way.
Predicting the patterns using the data set with prediction
algorithms will certainly give an idea to the world that global
warming is alarming issue that has to be taken in a concerning
way. Most of the prediction techniques take into account only
the temperature rise, but, this paper will focus more than that.
Data set has to be categorised in a way that there can be
separate results on separate factors of global warming that
which can automatically gives everyone an idea on what to
reduce and on what to take care of. One has to know what has
to be stopped using and what not. This paper focuses on such
agenda that will give results and will give a chance to redeem
the nature and environment to being extinct.
III. METHODOLOGY
A. Proposed methodology
A survey cum experimental methodology is used. Through
extensive search of the literature and discussion with experts
on global warming effects, a number of factors that are
considered to have influence on the effects of global warming
are identified. These influencing factors are categorized as
input variables. For this work, recent real world data is
collected from online (World Development bank). This data
is then filtered out using manual techniques. Then data will
transform into a standard format. After that, features and
parameters selection is identified. Then analysis of identified
parameters and implementation will be performing on the
tool. After implementation results will produced and
analyzed. Stepwise description of methodology used is
represented with the help of flowchart as shown in Fig 1
Fig 1. Flowchart of proposed work
IV. EXPERIMENTATION
A. Database
Use a numerical database in this experimental setup, collected
the data from a various websites and converted that data into a
relational database schema.
Tab 1. Dataset on factors of global warming
B. Algorithms
In the survey many algorithms are used for the prediction
which helps to predict the most influence factors that are
affecting the environment. An algorithm in data mining is a
set of heuristics and calculations that create a model from
data. The mining model that an algorithm creates from data
can take various forms, including classification, regression,
prediction, density estimate, and association rule.
Classification algorithms predict one or more discrete
variables based on the other attributes in the dataset.
Regression algorithms predict one or more continuous
numeric variables, such as profit or loss, based on
the other attributes in the dataset.
Segmentation algorithms divide data into groups, or
clusters, of the items that have similar properties.
Association algorithms find correlation between
different attributes in a dataset. The most common
application of this kind of algorithms is for creating
association rules, which can be used in a market
basket analysis.
Sequence analysis algorithms summarize frequent
sequences or episodes in data, such as a series of
clicks in a web site, or a series of log events
preceding machine maintenance.
One of the above mentioned algorithms will be use for
prediction.
.
V. CONCLUSION AND FUTURE WORK
In this paper, classification techniques are used for prediction
on the dataset of global warming, to predict and analyze
factors affecting the environment as well most hazardous
factors among them. This research helps everyone on what to
reduce and on what to take care of. One has to know what has
to be stopped using and what not. This paper focuses on such
agenda that will give results and will give a chance to redeem
the nature and wildlife environment to being extinct. This
paper discuss on the factors that mostly affect the
environment in a hazardous way. Predicting the patterns using
the data set with classification algorithms will certainly give
an idea to the world that global warming in alarming issue that
Factors Years Variables
Greenhouse
gases(co2)
2001-2011 Domestic
transport,
End user level,
Industries,
Household waste,
Burning fossil
fuels
Road
Rail
Taxi
Chemical
Deforestation 2001-2011 Not plantation,
Whether,
Population
Gross forest loss
U N forest loss
3. International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-5, Issue-3, July 2016
135 www.erpublication.org
has to be taken in a concerning way, which further provide
base for deciding special aid to them. In future, Integration of
data mining techniques with DBMS and machine learning
techniques is merged together on different datasets to find
accuracy and predictions of desired results. Also, some new
factors can be applied to improve lives, learning and retention
capabilities among people. Hence the future of Global
warming is promising for further research and can be applied
in other areas like medicine, sports, education and share
market due to the availability of huge databases.
REFERENCES
[1] A M Freije, T Hussain, E A Salman, "Global warming awareness
among the University of Bahrain science students" Elsevier, Journal
of the Association of Arab Universities for Basic and Applied
Sciences, pp 89-95, 2016.
[2] BIMAL K. BOSE,"Global Warming Energy, Environmental
Pollution, and the Impact of Power Electronics" IEEE, vol. 6, pp 6-17,
2010.
[3] Chu B, Duncan S, Papachristodoulou A, Hepburn C ,"Analysis and
control design of sustainable policies for greenhouse gas emissions"
Elsevier , pp no. 35-43, 2012.
[4] H Wang, X Lu, P Xu, D Yuan, "Short-term Prediction of Power
Consumption for Large-scale Public Buildings based on Regression
Algorithm" Elsevier, Procedia Engineering, vol. 121 , pp 1318-1325,
2015.
[5] http://data.giss.nasa.gov/gistemp/.
[6] http://thinkprogress.org/climate/2016/03/02/3755715/satellites-hotte
st-february-global-warming/.
[7] http://data.worldbank.org/climate-change/.
[8] https://climatedataguide.ucar.edu/climate-data/global-temperature-da
ta-sets-overview-comparison-table.
[9] https://www.google.co.in/search?q=latest+dataset+on+global+warmi
ng&ie=utf-8&oe=utf&&gws_rd=cr&ei=awQBV_3NKoiTuATWzb
XgDw#q=dataset+on+deforestation.
[10] Jian-Bin H,Shao-Wu W,Yong L, Zong-Ci Z, Xin-Yu W," The Science
of Global Warming" IEEE, Advances in climate change research, vol.
3, pp 174-178, 2012.
[11] J.Mankoff , R. Kravets, E. Blevis, "Some computer science issues in
creating a sustainable world" IEEE http://earthzine.org/ , 2008.
[12] K.KAKU ,"Global Warming and Climate Change of Asian Countries
Including Japanese Domestic Greenhouse Gas (GHG) Reduction in
the Field of Poultry and Swine Industries" Elsevier., Procedia
Engineering , vol. 8, pp 511–514, 2011.
[13] P Kaur, M Singh, G Singh Josan, "Classification and prediction based
data mining algorithms to predict slow learners in education sector"
Elsevier, Procedia Computer science, vol. 57, pp 500-508, 2015.
[14] P C Austin, E W Steyerbery," The number of subjects per variable
required in linear regression analyses "Elsevier, Journal of Clinical
Epidemiology, vol. 68, pp 627-636,2015.
[15] S.G. Wiedemann , M.-J. Yan , B.K. Henry , C.M. Murphy, "Resource
use and greenhouse gas emissions from three wool production regions
in Australia", Elsevier, Journal of Cleaner Production, pp 1-12, 2016.
[16] S. Rahman and A. D. Castro, ‘‘Environmental impacts of electricity
generation: A global perspective,’’ IEEE Trans. Energy Conversion,
vol. 10, pp. 307–313, June 1995.
[17] T-S Kwon; C M Lee; T Tae; W Kim;S-S Kim, J H Sung, " Prediction
of abundance of forest spiders according to climate warming in South
Korea," Elsevier, Journal of Asia-Pacific Biodiversity, vol. 7 pp.
2287-884, 2014.
[18] T-S Kwon; C M Lee; S-S Kim, " Prediction of abundance of beetles
according to climate warming in South Korea," Elsevier, Journal of
Asia-Pacific Biodiversity, vol. 8,pp. 2287-884, 2015.
[19] T-S Kwon; C M Lee; J Park; S-S Kim; J H Chun; J H Sung, "
Prediction of abundance of ants due to climate warming in South
Korea," Elsevier, Journal of Asia-Pacific Biodiversity ,vol. 7, pp.
2287-884, 2014.
[20] Wang S, Wen X, Luo Y,Tang G, Zha Z,Huang J," Does the Global
Warming Pause in the Last Decade: 1999-2008?", IEEE, Advances in
climate change research, vol 1, pp 49-54, 2010.
[21] Wikipedia Encyclopedia [Online].Available:
http://en.wikipedia.org/wiki/Global warming.
[22] Yaduvanshi A and Ranade A, "Effect of Global Temperature Changes
on Rainfall Fluctuations Over River Basins across Eastern
Indo-Genetic Plains" Elsevier, vol. 4, pp 721 – 729, 2015.
[23] Y.C. Ma, X.W. Kong, B. Yang, X.L. Zhang, X.Y. Yan, J.C. Yang,
Z.Q. Xiong, "Net global warming potential and greenhouse gas
intensity of annual rice–wheat rotations with integrated soil–crop
system management" Elsevier, Agriculture, Ecosystems and
Environment, vol 164, pp 209– 219, 2013.