This document discusses multiple linear regression techniques. It begins by explaining that multiple linear regression is used to predict a dependent variable from a set of independent variables. It then provides details on assumptions that must be satisfied, how to identify and handle outliers, and the steps involved in performing multiple linear regression analysis. Examples are also provided to illustrate key concepts.
The document discusses various statistical hypothesis tests that can be used to analyze hydrological data, including the t-test and ANOVA. It provides examples of how to set up null and alternative hypotheses, calculate relevant statistics like t-statistics and F-statistics, and make decisions about whether to reject the null hypothesis based on comparing these statistics to critical values. One example analyzes groundwater depth data from three catchments using ANOVA to test if depths differ between catchments.
This document discusses correlation and statistical methods for examining the relationship between two variables. It defines correlation and describes how correlation can indicate the direction, strength, and significance of a relationship. Different types of correlation are described, including simple, multiple, partial, and total correlation. Methods for calculating and interpreting the correlation coefficient are provided along with examples of exploring relationships between hydrological variables.
This document discusses stochastic methods in hydrology, specifically Markov transition matrices and cumulative distribution functions. It describes how to calculate daily monsoon rainfall using a Markov chain model with four rainfall classes. The initial condition and transition probabilities are given. It also discusses stationary time series, linear stochastic models including moving averages, autoregressive models and autoregressive moving average models. Double moving averages are presented to remove trends and improve forecasts.
This document discusses statistical hydrology and summarizing data. It describes defining problems, collecting relevant data through sampling techniques, and assessing data quality before analysis. Statistical hydrology involves collecting and analyzing variable, limited water resources data to make decisions and scientific discoveries. Descriptive statistics are used to summarize datasets while inferential statistics enable inferences about unknown aspects.
This document discusses statistical methods for simple linear regression including tests of significance for the slope and intercept. It introduces alternative regression methods such as the Kendall-Theil robust line that can be used when the assumptions of ordinary least squares regression are not met, such as when the residuals are not normally distributed. An example demonstrates how to calculate the Kendall-Theil robust line and test its significance.
This document discusses probability distributions and their applications in statistical hydrology. It begins by explaining discrete and continuous random variables and their probability functions. It then covers several specific probability distributions including binomial, Poisson, normal, lognormal, gamma, exponential and Gumbel distributions. Examples are provided to illustrate how these distributions can be used to calculate probabilities of hydrologic events like floods or rainfall.
This document discusses regression analysis and its application in hydrology. It begins by defining regression as a statistical technique used to determine the functional relationship between two variables. Simple linear regression finds the best fit linear equation to describe the relationship between a dependent and independent variable. Regression can be used to predict outcomes, describe relationships, and control for variables. The document provides examples of applying regression to predict erosion based on wave height data. It explains how to calculate the regression equation and error term.
The document discusses concepts related to statistical analysis of hydrological data, including measures of skewness, kurtosis, outliers, and the common characteristics of water resources data. Skewness measures asymmetry in a distribution, while kurtosis measures peakedness. Outliers are identified using methods like Chauvenet's criterion, Grubbs' test, and Dixon's Q test. Water resources data commonly has a lower bound of zero, outliers, non-normal distributions, positive skewness, seasonal patterns, and positive autocorrelation between consecutive observations.
The document discusses various statistical hypothesis tests that can be used to analyze hydrological data, including the t-test and ANOVA. It provides examples of how to set up null and alternative hypotheses, calculate relevant statistics like t-statistics and F-statistics, and make decisions about whether to reject the null hypothesis based on comparing these statistics to critical values. One example analyzes groundwater depth data from three catchments using ANOVA to test if depths differ between catchments.
This document discusses correlation and statistical methods for examining the relationship between two variables. It defines correlation and describes how correlation can indicate the direction, strength, and significance of a relationship. Different types of correlation are described, including simple, multiple, partial, and total correlation. Methods for calculating and interpreting the correlation coefficient are provided along with examples of exploring relationships between hydrological variables.
This document discusses stochastic methods in hydrology, specifically Markov transition matrices and cumulative distribution functions. It describes how to calculate daily monsoon rainfall using a Markov chain model with four rainfall classes. The initial condition and transition probabilities are given. It also discusses stationary time series, linear stochastic models including moving averages, autoregressive models and autoregressive moving average models. Double moving averages are presented to remove trends and improve forecasts.
This document discusses statistical hydrology and summarizing data. It describes defining problems, collecting relevant data through sampling techniques, and assessing data quality before analysis. Statistical hydrology involves collecting and analyzing variable, limited water resources data to make decisions and scientific discoveries. Descriptive statistics are used to summarize datasets while inferential statistics enable inferences about unknown aspects.
This document discusses statistical methods for simple linear regression including tests of significance for the slope and intercept. It introduces alternative regression methods such as the Kendall-Theil robust line that can be used when the assumptions of ordinary least squares regression are not met, such as when the residuals are not normally distributed. An example demonstrates how to calculate the Kendall-Theil robust line and test its significance.
This document discusses probability distributions and their applications in statistical hydrology. It begins by explaining discrete and continuous random variables and their probability functions. It then covers several specific probability distributions including binomial, Poisson, normal, lognormal, gamma, exponential and Gumbel distributions. Examples are provided to illustrate how these distributions can be used to calculate probabilities of hydrologic events like floods or rainfall.
This document discusses regression analysis and its application in hydrology. It begins by defining regression as a statistical technique used to determine the functional relationship between two variables. Simple linear regression finds the best fit linear equation to describe the relationship between a dependent and independent variable. Regression can be used to predict outcomes, describe relationships, and control for variables. The document provides examples of applying regression to predict erosion based on wave height data. It explains how to calculate the regression equation and error term.
The document discusses concepts related to statistical analysis of hydrological data, including measures of skewness, kurtosis, outliers, and the common characteristics of water resources data. Skewness measures asymmetry in a distribution, while kurtosis measures peakedness. Outliers are identified using methods like Chauvenet's criterion, Grubbs' test, and Dixon's Q test. Water resources data commonly has a lower bound of zero, outliers, non-normal distributions, positive skewness, seasonal patterns, and positive autocorrelation between consecutive observations.
This document provides a list of the top 100 greatest Hollywood actors of all time as compiled from a poll of 50 film lovers. It notes that while many actors have tried to make it in Hollywood over the past 100 years, only a select few have left a lasting impression on audiences and continued to be admired by younger generations. The list includes actors from almost all decades who were popular during their careers and spanned entertainment-focused, method acting styles, and those who also directed or wrote screenplays.
This document discusses arrays in C language. It defines an array as a data structure that stores a collection of similar data types. It describes how to declare, create and initialize single and multi-dimensional arrays in C. It also explains that arrays have a fixed size once declared, and elements can be accessed via indexes. Multidimensional arrays can be thought of as arrays of arrays.
El documento presenta la misión y visión del Sistema Universitario UNIMINUTO, que busca ofrecer educación superior de calidad con acceso flexible para formar profesionales competentes y líderes de cambio social, además de construir un país justo y en paz. También describe brevemente el reglamento estudiantil de la institución y establece que la inasistencia a más del 15% de las clases puede resultar en la pérdida de la asignatura.
Cisco has provided with a set of four Demo Tickets for practicing TSHOOT(642-832).
Feel free to go through the solution and mail back for any queries further.
Link for the testing portal:
https://www.cisco.com/web/learning/le3/le2/le37/le10/tshoot_demo.html
This document provides an overview of topics to be covered in a 3-week professional engineering exam review session on hydrology and hydraulics. It will cover key aspects of hydrology including the hydrologic cycle, precipitation, runoff analysis using the Curve Number method, and peak discharge calculations. Hydraulics topics will include flow through common structures like weirs, orifices, and pipes. Example problems will be worked through for each major topic to illustrate concepts and calculations. Attendees are advised to review references and practice additional example problems.
Fagan Inspection is a rigorous quality control technique invented in the 1970s for inspecting software documents to find defects. It involves a team reviewing a document using a structured process with defined roles. The process includes planning, an overview meeting, individual preparation, the inspection meeting, defect analysis, rework, and follow up. When performed properly it is very effective at finding defects early. However, inspections are rarely used due to professional and organizational ignorance, difficulties in implementation, and intangible perceived benefits. Making inspections work requires addressing these challenges.
The document outlines guidelines for employees operating company vehicles, including requirements to maintain a valid driver's license, notify the company of citations, follow safe driving practices, ensure passengers wear seatbelts, properly maintain the vehicle, obtain spousal approval for personal use, be responsible for parking/traffic violations, report accidents within 12 hours, pay deductibles for at-fault accidents, avoid modifications, and not use vehicles for family vacations or towing. Employees also must not operate vehicles under the influence of alcohol or drugs. Failure to follow the guidelines can result in lost driving privileges or termination.
1. The document discusses how security is changing with new technologies like cloud computing, DevOps, and agile development. Traditional security practices are no longer effective.
2. It advocates migrating security left in the development process so it is designed into applications from the beginning. This allows for a faster security feedback loop.
3. Security needs to be automated and tested using tools and data platforms. Monitoring and inspecting everything is important for the new dynamic environments. Security decisions and controls are also changing to adapt to these new realities.
Arun Kumar is applying for a position in customer service, business development, or operations/administration. He has over 19 years of experience in various leadership roles managing teams, business operations, customer relationships, and administration across multiple industries. Currently he works as a Business Banker at Mashreq Bank in Dubai where he manages a portfolio and team. Previously he held senior roles like Head of Business Operations and Senior Divisional Manager at other companies where he was responsible for profitability, branch expansion, and people management. He believes his skills in management, customer service, marketing, and administration can benefit the company.
This document discusses trend analysis of time series data. It defines time series as measurements of a variable taken at regular intervals over time. Time series can show trends, seasonal variations, cyclical variations, and irregular variations. Trend analysis determines if there is a significant increasing or decreasing trend in the data over time. Linear regression and non-parametric Mann-Kendall tests are common methods used to test for trends and estimate their magnitude. The selection of an appropriate trend analysis method depends on characteristics of the water resources data such as distributions, outliers, and missing values.
From the past many years many software defects prediction models are developed to solve the various issues in software project development. Software reliability is the significant in software quality which evaluates and predicts the quality of the software based on the defects prediction. Many software companies are trying to improve the software quality and also trying to reduce the cost of the software development. Rayleigh model is one of the significant models to analyze the software defects based on the generated data. Analysis of means (ANOM) is statistical technique which gives the quality assurance based on the situations. In this paper, an improved software defect prediction models (ISDPM) are used for predicting defects occur at the time of five phases such as analysis, planning, design, testing and maintenance. To improve the performance of the proposed methodology an order statistics is adopted for better prediction. The experiments are conducted on 2 synthetic projects that are used to analyze the defects.
Quantitative Risk Assessment - Road Development PerspectiveSUBIR KUMAR PODDER
This document outlines an approach for quantitative risk assessment in road transport infrastructure projects using stochastic analysis with triangular distributions. It discusses determining the combined influence of parameters like project cost and traffic on economic indicators. Traditionally, risks from cost and traffic changing from base cases are analyzed separately using triangular distributions defined by minimum, most likely and maximum limits. The document proposes a method to analyze the combined influence of both parameters varying simultaneously using bivariate distributions and conditional probabilities.
Forecasting Municipal Solid Waste Generation Using a Multiple Linear Regressi...IRJET Journal
- The document describes developing a multiple linear regression model to forecast municipal solid waste generation based on factors like population, population density, education levels, access to services, and income levels.
- The model was developed using data from various municipalities in Italy. Exploratory data analysis was conducted to determine linear relationships between waste generation and predictors.
- The linear regression model achieved a high R-squared value of 91.81%, indicating a close fit to the data. Various error metrics like MAE, MSE and RMSE were calculated to evaluate model performance.
- The regression model provides a simple yet accurate means of predicting municipal solid waste that requires minimal data and can be generalized to other locations.
This document compares several supervised machine learning classification algorithms on a Titanic dataset: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and Naive Bayes. It finds that Random Forest achieves the highest accuracy. Evaluation metrics like precision, recall, F1-score, and accuracy are used to evaluate and compare model performance on test data.
This document compares several supervised machine learning classification algorithms on a Titanic dataset: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and Naive Bayes. It finds that Random Forest achieves the highest accuracy. It preprocesses the dataset, trains models on a training set, and evaluates them using metrics like precision, recall, and F1-score calculated from the confusion matrix on a test set.
This document describes the development of a numerical semiconductor device simulator (SDS) using MATLAB. The SDS aims to supplement semiconductor device physics and numerical analysis course materials by allowing users to model basic semiconductor devices and analyze parameters like carrier densities, potential, and electric fields. The document provides an overview of the necessary semiconductor device physics, including equations of carrier transport and Poisson's equation that govern device behavior. Models for carrier statistics, mobility, and recombination/generation are also described. The numerical techniques used in the SDS are discussed along with plans to validate simulation results against theory and other tools. An appendix provides details on the program functions developed to support the SDS.
The document discusses different types of mathematical models, including deterministic and probabilistic models. It provides examples of each. It also discusses building, verifying, and refining mathematical models. Additionally, it covers optimization models, their components including objective functions and constraints. Finally, it discusses specific types of optimization models like linear programming, network flow programming, and integer programming.
Blood Transfusion success rate prediction using Artificial IntelligenceIRJET Journal
This document discusses using machine learning models to predict whether patients will require an intraoperative blood transfusion during mitral valve surgery. Specifically, it examines using the XGBoost and gradient boost techniques to predict transfusion success rates. It finds that XGBoost achieves an accuracy of about 93% for predicting transfusions, compared to 90% for gradient boost, making XGBoost the better performing model. The document concludes that machine learning can successfully predict transfusion needs with an accuracy of 93% using XGBoost.
This document provides a list of the top 100 greatest Hollywood actors of all time as compiled from a poll of 50 film lovers. It notes that while many actors have tried to make it in Hollywood over the past 100 years, only a select few have left a lasting impression on audiences and continued to be admired by younger generations. The list includes actors from almost all decades who were popular during their careers and spanned entertainment-focused, method acting styles, and those who also directed or wrote screenplays.
This document discusses arrays in C language. It defines an array as a data structure that stores a collection of similar data types. It describes how to declare, create and initialize single and multi-dimensional arrays in C. It also explains that arrays have a fixed size once declared, and elements can be accessed via indexes. Multidimensional arrays can be thought of as arrays of arrays.
El documento presenta la misión y visión del Sistema Universitario UNIMINUTO, que busca ofrecer educación superior de calidad con acceso flexible para formar profesionales competentes y líderes de cambio social, además de construir un país justo y en paz. También describe brevemente el reglamento estudiantil de la institución y establece que la inasistencia a más del 15% de las clases puede resultar en la pérdida de la asignatura.
Cisco has provided with a set of four Demo Tickets for practicing TSHOOT(642-832).
Feel free to go through the solution and mail back for any queries further.
Link for the testing portal:
https://www.cisco.com/web/learning/le3/le2/le37/le10/tshoot_demo.html
This document provides an overview of topics to be covered in a 3-week professional engineering exam review session on hydrology and hydraulics. It will cover key aspects of hydrology including the hydrologic cycle, precipitation, runoff analysis using the Curve Number method, and peak discharge calculations. Hydraulics topics will include flow through common structures like weirs, orifices, and pipes. Example problems will be worked through for each major topic to illustrate concepts and calculations. Attendees are advised to review references and practice additional example problems.
Fagan Inspection is a rigorous quality control technique invented in the 1970s for inspecting software documents to find defects. It involves a team reviewing a document using a structured process with defined roles. The process includes planning, an overview meeting, individual preparation, the inspection meeting, defect analysis, rework, and follow up. When performed properly it is very effective at finding defects early. However, inspections are rarely used due to professional and organizational ignorance, difficulties in implementation, and intangible perceived benefits. Making inspections work requires addressing these challenges.
The document outlines guidelines for employees operating company vehicles, including requirements to maintain a valid driver's license, notify the company of citations, follow safe driving practices, ensure passengers wear seatbelts, properly maintain the vehicle, obtain spousal approval for personal use, be responsible for parking/traffic violations, report accidents within 12 hours, pay deductibles for at-fault accidents, avoid modifications, and not use vehicles for family vacations or towing. Employees also must not operate vehicles under the influence of alcohol or drugs. Failure to follow the guidelines can result in lost driving privileges or termination.
1. The document discusses how security is changing with new technologies like cloud computing, DevOps, and agile development. Traditional security practices are no longer effective.
2. It advocates migrating security left in the development process so it is designed into applications from the beginning. This allows for a faster security feedback loop.
3. Security needs to be automated and tested using tools and data platforms. Monitoring and inspecting everything is important for the new dynamic environments. Security decisions and controls are also changing to adapt to these new realities.
Arun Kumar is applying for a position in customer service, business development, or operations/administration. He has over 19 years of experience in various leadership roles managing teams, business operations, customer relationships, and administration across multiple industries. Currently he works as a Business Banker at Mashreq Bank in Dubai where he manages a portfolio and team. Previously he held senior roles like Head of Business Operations and Senior Divisional Manager at other companies where he was responsible for profitability, branch expansion, and people management. He believes his skills in management, customer service, marketing, and administration can benefit the company.
This document discusses trend analysis of time series data. It defines time series as measurements of a variable taken at regular intervals over time. Time series can show trends, seasonal variations, cyclical variations, and irregular variations. Trend analysis determines if there is a significant increasing or decreasing trend in the data over time. Linear regression and non-parametric Mann-Kendall tests are common methods used to test for trends and estimate their magnitude. The selection of an appropriate trend analysis method depends on characteristics of the water resources data such as distributions, outliers, and missing values.
From the past many years many software defects prediction models are developed to solve the various issues in software project development. Software reliability is the significant in software quality which evaluates and predicts the quality of the software based on the defects prediction. Many software companies are trying to improve the software quality and also trying to reduce the cost of the software development. Rayleigh model is one of the significant models to analyze the software defects based on the generated data. Analysis of means (ANOM) is statistical technique which gives the quality assurance based on the situations. In this paper, an improved software defect prediction models (ISDPM) are used for predicting defects occur at the time of five phases such as analysis, planning, design, testing and maintenance. To improve the performance of the proposed methodology an order statistics is adopted for better prediction. The experiments are conducted on 2 synthetic projects that are used to analyze the defects.
Quantitative Risk Assessment - Road Development PerspectiveSUBIR KUMAR PODDER
This document outlines an approach for quantitative risk assessment in road transport infrastructure projects using stochastic analysis with triangular distributions. It discusses determining the combined influence of parameters like project cost and traffic on economic indicators. Traditionally, risks from cost and traffic changing from base cases are analyzed separately using triangular distributions defined by minimum, most likely and maximum limits. The document proposes a method to analyze the combined influence of both parameters varying simultaneously using bivariate distributions and conditional probabilities.
Forecasting Municipal Solid Waste Generation Using a Multiple Linear Regressi...IRJET Journal
- The document describes developing a multiple linear regression model to forecast municipal solid waste generation based on factors like population, population density, education levels, access to services, and income levels.
- The model was developed using data from various municipalities in Italy. Exploratory data analysis was conducted to determine linear relationships between waste generation and predictors.
- The linear regression model achieved a high R-squared value of 91.81%, indicating a close fit to the data. Various error metrics like MAE, MSE and RMSE were calculated to evaluate model performance.
- The regression model provides a simple yet accurate means of predicting municipal solid waste that requires minimal data and can be generalized to other locations.
This document compares several supervised machine learning classification algorithms on a Titanic dataset: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and Naive Bayes. It finds that Random Forest achieves the highest accuracy. Evaluation metrics like precision, recall, F1-score, and accuracy are used to evaluate and compare model performance on test data.
This document compares several supervised machine learning classification algorithms on a Titanic dataset: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and Naive Bayes. It finds that Random Forest achieves the highest accuracy. It preprocesses the dataset, trains models on a training set, and evaluates them using metrics like precision, recall, and F1-score calculated from the confusion matrix on a test set.
This document describes the development of a numerical semiconductor device simulator (SDS) using MATLAB. The SDS aims to supplement semiconductor device physics and numerical analysis course materials by allowing users to model basic semiconductor devices and analyze parameters like carrier densities, potential, and electric fields. The document provides an overview of the necessary semiconductor device physics, including equations of carrier transport and Poisson's equation that govern device behavior. Models for carrier statistics, mobility, and recombination/generation are also described. The numerical techniques used in the SDS are discussed along with plans to validate simulation results against theory and other tools. An appendix provides details on the program functions developed to support the SDS.
The document discusses different types of mathematical models, including deterministic and probabilistic models. It provides examples of each. It also discusses building, verifying, and refining mathematical models. Additionally, it covers optimization models, their components including objective functions and constraints. Finally, it discusses specific types of optimization models like linear programming, network flow programming, and integer programming.
Blood Transfusion success rate prediction using Artificial IntelligenceIRJET Journal
This document discusses using machine learning models to predict whether patients will require an intraoperative blood transfusion during mitral valve surgery. Specifically, it examines using the XGBoost and gradient boost techniques to predict transfusion success rates. It finds that XGBoost achieves an accuracy of about 93% for predicting transfusions, compared to 90% for gradient boost, making XGBoost the better performing model. The document concludes that machine learning can successfully predict transfusion needs with an accuracy of 93% using XGBoost.
The document discusses regression analysis and random forest machine learning algorithms. It explains that regression analysis is used to predict continuous variables like sales based on related predictor variables like advertisement spending. Regression finds the correlation between variables to enable predictions. Random forest is an ensemble technique that creates multiple decision trees on subsets of data and takes a majority vote of the trees' predictions to improve accuracy. It provides a two-phase working process for random forest involving creating trees on random data samples and then making predictions based on the trees' votes.
A Comparative Analysis of Slicing for Structured ProgramsEditor IJCATR
Program Slicing is a method for automatically decomposing programs by analyzing their data flow and control flow.
Slicing reduces the program to a minimal form called “slice” which still produces that behavior. Program slice singles out all
statements that may have affected the value of a given variable at a specific program point. Slicing is useful in program
debugging, program maintenance and other applications that involve understanding program behavior. In this paper we have
discuss the static and dynamic slicing and its comparison by taking number of examples
This document provides information about a Computational Fluid Dynamics course taught by Prof. Dr. RAO Yu at Shanghai Jiao Tong University. The course will cover fundamental CFD theories, techniques, and applications. Students will work in groups on projects and submit a final report making up 40-50% of their grade. The textbook is Computational Fluid Dynamics: A Practical Approach and lectures will introduce governing equations, numerical methods, discretization, and turbulence modeling. CFD can provide detailed flow field simulations to complement experimental and analytical approaches in engineering design and research.
Dependency analysis is a technique to detect dependencies between tasks that prevent these tasks from running in parallel. It is an important aspect of parallel programming tools. Dependency analysis techniques are used to determine how much of the code is parallelizable. Literature shows that number of data dependence test has been proposed for parallelizing loops in case of arrays with linear subscripts, however less work has been done for arrays with nonlinear subscripts. GCD test, Banerjee method, Omega test, I-test dependence decision algorithms are used for one-dimensional arrays under constant or variable bounds. However, these approaches perform well only for nested loop with linear array subscripts. The Quadratic programming (QP) test, polynomial variable interval (PVI) test, Range test are typical techniques for nonlinear subscripts. The paper presents survey of these different data dependence analysis tests.
Investigation Effect of Outage Line on the Transmission Line for Karbalaa-132...IRJET Journal
This document discusses investigating the effect of outage lines on the 132kV transmission network in Karbalaa, Iraq. It presents two cases studied using a simulation program to analyze the impact of line outages. The document introduces the network and provides data on buses, lines, and power flows. It then describes the DC power flow algorithm used in the analysis and calculation of sensitivity factors like generation shift factors and line outage distribution factors to evaluate how outages may increase power flows over limits on other lines. The results and discussion section presents the network data and baseline power flows before analyzing the two outage cases and identifying overloaded lines.
This document summarizes a study that analyzed different unsteady friction models used in water hammer analysis software. The study implemented several friction models in Python code and compared them based on input requirements, stability, computational efficiency, and accuracy when validated with experimental data. It was found that Vítkovský's unsteady friction model performed best. This model was then implemented in the WANDA commercial software. Testing showed the model provided more accurate simulations of transient pressure waves compared to the previous quasisteady friction approach.
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESIRJET Journal
The document discusses using machine learning techniques like multiple linear regression and support vector regression to analyze and predict rainfall. It analyzes over 100 years of monthly rainfall data from India to train and test the models. The support vector regression model achieved the highest accuracy in predicting rainfall compared to the multiple linear regression model. Accurately predicting rainfall is important for agriculture in India since much of the economy depends on rainfall and farming. The models extract patterns from historical weather data to forecast future rainfall levels.
Support vector machines (SVMs) are supervised machine learning models that analyze data used for classification and regression analysis. SVMs find a hyperplane that separates clusters of data points and maximizes the margin between the different classes. They can be used for applications like credit card approval predictions, patient risk assessments in hospitals, and categorizing text and web pages. SVMs work by finding the optimal separating hyperplane that maximizes the margin between different classes of data points in the training set.
This document provides guidance on using regression analysis to validate hydrological data. It discusses using simple linear regression to establish relationships between variables like rainfall and runoff. Key steps covered include estimating regression coefficients to minimize the error variance, measuring the goodness of fit using the coefficient of determination, and examining residuals over time and versus other variables to evaluate changes in the rainfall-runoff relationship. The overall aim is to detect errors in discharge data by comparing observed and computed runoff derived from regression models.
In this chapter, our goal is to introduce the foundational principles of supervised learning. As we progress, we place particular emphasis on both regression and classification techniques, offering learners a more comprehensive perspective on the practical application of these methodologies in real-world scenarios. By the end of this chapter, learners will not only possess a robust understanding of the core principles but will also be armed with valuable insights into the tangible applications of supervised learning. This knowledge empowers them to skillfully navigate and leverage the full potential of this influential paradigm within the vast expanse of machine learning.
This document discusses the relevance and implications of forecasting retail deposits. Forecasting retail deposits involves analyzing macroeconomic data to build models that can accurately predict future deposit levels given economic conditions. Accurately forecasting deposits is important for banks to inform strategic planning and decisions around operations, technology, and infrastructure needs. The implications of deposit forecasting are discussed from social and philosophical perspectives, including how forecasting stems from humans' innate desire to understand and prepare for an uncertain future.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
1. MAL1303: STATISTICAL
HYDROLOGY
Multiple Regression
Dr. Shamsuddin Shahid
Associate Professor
Department of Hydraulics and Hydrology
Faculty of Civil Engineering
Room No.: M46-332;
Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
2. Simple Linear Regression
Simple Linear Regression (SLR) is a statistical
technique that is used to determine the
functional relationship between two variables.
Regression gives an equation that best describes
the relationship between two variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
3. Multiple Linear Regression (MLR)
Multiple linear regression is a statistical technique where a
dependent variable is predicted from a set of predictors
Multiple regression is a statistical technique that is used to
identify relationship between a dependent variable and a
combination of independent variables.
The relationship is valid when few assumptions are fulfilled.
Failing to satisfy the assumptions does not mean that
relationship is not correct. It means that the relationship may
not be strong enough.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
4. • The variables should be measure in interval/ratio scale.
• Dependent variable, Y must be normally distributed (no
skewness or outliers)
• Predictors, X’s do not need to be normally distributed, but
if they are it makes for a stronger interpretation.
• There should be linear relationship between Y and all X
• no outliers among Xs predicting Y
• Variance on Y is the same at all values of X
(homoscedastic)
Linear Multiple Regression: Assumptions
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
5. Linear Multiple Regression: Outliers
• Outliers can distort the regression results in multiple regression as
like simple linear regression. When an outlier is included in the
analysis, it pulls the regression line towards itself. This can result in a
solution that is more accurate for the outlier, but less accurate for all
of the other cases in the data set.
• It is necessary to check for outliers in the dependent variable and in
the independent variables.
• Removing an outlier may improve the distribution of a variable.
• Transforming a variable may reduce the likelihood that the value for a
case will be characterized as an outlier.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
6. 1. Decide dependent and independent variables.
2. Test for normality, linearity, homoscedasticity.
3. In necessary, remove the outliers.
4. If it does not satisfy the criteria for normality, transformation
is required. Decide which transformations should be used.
5. Substitute transformations and run regression entering all
independent variables.
6. Do multiple regression analysis with variables specified in the
problem.
7. Test the significance of the regression equation.
Linear Multiple Regression: Steps
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
7. Simple Linear Regression
In Simple Linear Regression (SLR), the functional relationship
between two variables X and Y are determined.
Regression equation is the equation of a straight line that best
describes the relationship between two variables.
When the equation is used to calculate Y from observed X, it
gives an error ε in the prediction. Therefore, the Y equals to
predicted value plus error.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
8. Multiple Linear Regression (MLR): Basics
A multiple linear regression model is called “linear” because only
linear coefficients {β} are used. However, transforms of the
regressor variables are permitted in an MLR model like SLR.
In Multiple Linear Regression (SLR), the functional relationship of
dependent variable Y with more than one independent variables are
determined.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
9. Multiple Linear Regression (MLR): Basics
1 11 21
2 12 22 1
3 13 23 2
4 14 24
*
4 1 4 2 * 2 1
*
y x x
y x x b
y x x b
y x x
x x x
data design matrix parameters
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
10. Multiple Linear Regression (MLR): Basics
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11. Multiple Linear Regression: Basics
Create the design Matrix
Calculate the parameters:
Where, XT is the transpose of Matrix X
X-1 is the inverse of Matrix X
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
12. The Goodness of Fit of the Regression Model
One measure of how well a statistical model explains the observed
data is the coefficient of determination, that is, the square of the
Pearson correlation coefficient, r2, between y and x.
When x is replaced by ,
it gives the correlation between actual and predicted value, R2
It can also be measure by,
yˆ
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
13. Distinction between r and R are:
• r is a measure of association between two random variables
whereas R is a measure between a random variable y and its
prediction from a regression model.
• r lies in the interval - 1 r -1 while the multiple correlation R
cannot be negative; that is, it lies in the interval 0 R 1.
• R is always well defined, regardless of whether the independent
variable is assumed to be random or fixed. In contrast, calculating
the correlation between a random variable, Y, and a fixed predictor
variable, X, that is, a variable that is not considered random, makes
no sense.
The Goodness of Fit of the Regression Model
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
14. Multiple Linear Regression: Example
It is well known that groundwater recharge is directly related to
Rainfall and Soil Moisture Holding Capacity (SMHC). Instrumental
data of groundwater recharge, Rainfall and SMHC at six sites has
been collected. Find a empirical equation that related groundwater
recharge with Rainfall and SMHC
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
15. Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
16. Multiple Linear Regression: Solution
Create the design matrix
Get solution by:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
17. Multiple Linear Regression: Solution
Excel commands:
Matrix Inversion: MINV(array)
Matrix Multiplication: MMULT(array1, array2)
Matrix Transpose: Copy Matrix -> Past Special with tick on
transpose radio button.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
18. Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
19. Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
20. Recharge = 1.38 + 0.12Rainfall – 0.01SMHC
Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
21. Recharge = 1.38 + 0.12Rainfall – 0.01SMHC
Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
22. Basic assumptions about the errors:
1. The mean of the errors is zero
2. The errors are normally distributed.
3. The variances of the errors for all observations are
constant
4. The errors are independent of each other (uncorrelated)
Gross violations of these basic assumptions will yield a
poor or biased model. However, if the variances of the
errors are unequal and can be estimated, weighted
regression schemes can sometimes be used to obtain a
better model.
Multiple Linear Regression (MLR): Assumptions
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
23. is the Variance of residuals
Is the corresponding diagonal value of matrix
(XTX)-1
Multiple Linear Regression: Confidence Interval
Recharge = 1.38 + 0.12Rainfall – 0.01SMHC
The parameter values have range. We can find the range of a
parameter at a certain level of confidence by using following
formula:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
24. Recharge = 1.38 + 0.12Rainfall – 0.01SMHC
Multiple Linear Regression: Confidence Interval
n = 6, p = 3
At α = 0.05,
t(0.025, 3) = 4.18
s2 = 0.084
-0.35 ≤ β0 ≤ 3.11
-0.10 ≤ β1 ≤ 0.35
-0.16 ≤ β2 ≤ 0.14
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
25. • An estimator with lower variance is more efficient, in the
sense that it is likely to be closer to the true value over
samples.
• The “best” estimator is the one with minimum variance of all
estimators
Multiple Linear Regression: Efficient Estimator
Recharge = 1.38 + 0.12Rainfall – 0.01SMHC
-0.35 ≤ β0 ≤ 3.11
-0.10 ≤ β1 ≤ 0.35
-0.16 ≤ β2 ≤ 0.14
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
26. SST = SSE + SSR
Sum of Square Total (SST) = Total variability in the observed responses
Sum of Square Error (SSE) = Total error by the model, or variability that is not
explained by the model
Sum of Square Residual (SSR) = Systematic variability that is explained by the
regression model.
Multiple Linear Regression: Strength
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
27. Mean variation in observations, MST = SST / n-1
Mean Error, MSE = SSE / n-p
Mean regression, MSR = SSR / 1
Higher values of R2 indicate a better fit of the model to the sample
observations.
Disadvantage of R2: Adding any regressor variable to an MLR
model, even an irrelevant regressor, yields a smaller SSE and
greater R2. For this reason, R2 by itself is not a good measure of
the quality of fit.
Multiple Linear Regression: Strength
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
28. Multiple Linear Regression: Strength
To overcome this deficiency in R2, an adjusted value can be used.
The adjusted coefficient of multiple determination ( ) is defined
as,
Because the number of model coefficients (p) is used in
computing, the value will not necessarily increase with the
addition of any regressor. Hence, is a more reliable indicator
of model quality.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
29. SST = 1.27; SSR = 0.85; SSE = 0.42
MST = 0.26; MSR = 0.85; MSE = 0.14
= 0.67
= 0.45
SST = SSE + SSR
Multiple Linear Regression: Strength (Example)
Mean variation in observations, MST = SST / n-1
Mean Error, MSE = SSE / n-p
Mean regression, MSR = SSR / 1
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
30. F-test is used to assess the overall ability of a model.
When testing for the significance of the goodness of fit, our null hypothesis is
that the explanatory variables jointly equal 0.
If our F-statistic is below the critical value we fail to reject the null and
therefore we say the goodness of fit is not significant.
Multiple Linear Regression: F-statistics
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
31. The F-test is useful for testing a number of hypotheses and is often
used to test for single, global and the joint significance of a group of
variables.
Joint test often refer to ‘testing a restriction’.
This restriction is that a group of explanatory variables are jointly
equal to 0
Multiple Linear Regression: F-statistics
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
32. The global F-test is used to assess the overall ability of a model to
explain at least some of the observed variability in the sample
responses. The global F-test is performed in the following steps:
Null hypothesis: β1 = β2 = …. = βk = 0
The global F-statistics is calculated as,
F0 = MSR/MSE
If F(calculated) > F (critical) (α, k, n-p),
(where k = number of regressors; n = data points; p = parameters to
be estimated).
Reject the null hypothesis and conclude that at least one βj≠0 and at
least one model regressor explains some of the response variation.
Multiple Linear Regression: F-statistics
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
33. Recharge = 1.38 + 0.12Rainfall –
0.01SMHC
Multiple Linear Regression: Example
SST = 1.27 MST = 0.26
SSR = 0.85 MSR = 0.85
SSE = 0.42 MSE = 0.14
SST = SSE + SSR
F0 = MSR/MSE
= 6.07
F (critical) (α, k, n-p)
F (critical) (0.05, 2, 3)
= 9.55
F(calculated) < F (critical) (α, k, n-
p)
Null hypothesis can not
be rejected.
No model regressor
explains some of the
response variation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
34. Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
35. Multiple Linear Regression: Example
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
36. Multiple Linear Regression: Example
Discharge = 21.97 – 0.19ET + 1.55BF + 0.94R -1.05GWR
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
37. Discharge = 21.97 – 0.19ET + 1.55BF + 0.94R -1.05GWR
Multiple Linear Regression: Example
Null hypothesis:
β1 = β2 = β3 = β4 = 0
= 0.9865
F0 = MSR/MSE
= 7.68
F (critical) (α, k, n-p) =
F (critical) (0.05, 4, 7) = 4.12
F(calculated) > F (critical) (α, k,
n-p)
Null hypothesis
rejected.
Decision: At least one βj≠0 and at least one model regressor
explains some of the response variation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
38. Multiple Linear Regression: Example
Discharge = 33.50 – 0.28ET + 1.53BF + 0.28R
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
39. Discharge = 33.50 – 0.28ET + 1.53BF + 0.28R
Multiple Linear Regression: Example
Null hypothesis:
β1 = β2 = β3 = 0
F0 = MSR/MSE
= 6.3
F (critical) (α, k, n-p) =
F (critical) (0.05, 3, 8) = 4.07
F(calculated) > F (critical) (α, k,
n-p)
Null hypothesis
rejected.
Decision: Groundwater recharge has no significant impact on
Discharge.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
40. Multiple Linear Regression: Example
Discharge = ? + ? ET + ? BF + ? GWR
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
41. To carry out this test you need to conduct two separate regression,
one with all the explanatory variables in (unrestricted equation),
the other with the variables whose joint significance is being
tested, removed.
Then collect the RSS from both equations.
Put the values in the formula
Find the critical value and compare with the test statistic. The null
hypothesis is that the variables jointly equal 0.
Multiple Linear Regression: Joint Significance
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
42. The test for joint significance has its own formula, which takes
the following form:
RSSrestrictedRSS
RSSedunrestrictRSS
equationedunrestrictinparametersk
nsrestrictioofnumberm
knRSS
mRSSRSS
F
R
u
u
uR
/
/
Multiple Linear Regression: Joint Significance
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
43. Multiple Linear Regression: Joint Significance
Obs. No. Y X1 X2 x3
1 5.1 2.3 2.5 4.2
2 6.2 1.9 2.8 3.3
3 4.8 2.0 3.1 4.0
. . . . .
. . . . .
. . . . .
60 5.9 2.4 3.8 4.6
3322110 xαxαxααy
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
44. If we have a model consists of three explanatory variables. We wish to
test for the joint significance of 2 of the variables (x2 and x3), we need
to run the following restricted and unrestricted models:
restrictedxααy
edunrestrictxαxαxααy
t
t
110
3322110
Multiple Linear Regression: Joint Significance
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
45. Given the following model, we wish to test the joint significance of x2
and x3. Having estimated them, we collect their respective RSSs (n=60).
51
750
110
3322110
.RSS
restrictedxββy
.RSS
edunrestrictxαxαxααy
R
t
u
t
Multiple Linear Regression: Joint Significance
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
47. As the F statistic is greater than the critical value (28>3.15), we
reject the null hypothesis and conclude that the variables x2 and x3
are jointly significant and should remain in the model.
0:,
0:,
32
320
AHHypothesiseAlternativ
HHypothesisNull
Multiple Linear Regression: Joint Significance
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
48. Choosing the Best MLR Model
• One of the major issues in multiple regression is the appropriate
approach to variable selection.
• To make a appropriate regression model, we need to
subsequently add or delete variables from model.
• The benefit of adding additional variables to a multiple
regression model is to account for or explain more of the
variance of the response variable. The cost of adding additional
variables is that the degrees of freedom decreases, making it
more difficult to find significance in hypothesis tests and
increasing the width of confidence intervals.
A good model will explain as much of the variance of y as
possible with a small number of explanatory variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
49. The choice of whether to add a variable is based on a "cost-benefit
analysis", and variables enter the model only if they make a
significant improvement in the model.
There are at least two types of approaches for evaluating whether
a new variable sufficiently improves the model. The first approach
uses partial F-tests, and when automated is often called a
"stepwise" procedure.
The second approach uses some overall measure of model
quality. The latter has many advantages.
Choosing the Best MLR Model
Discharge = 21.97 – 0.19ET + 1.55BF + 0.94R -1.05GWR
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
50. Choosing the Best MLR Model
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)