This document discusses correlation and statistical methods for examining the relationship between two variables. It defines correlation and describes how correlation can indicate the direction, strength, and significance of a relationship. Different types of correlation are described, including simple, multiple, partial, and total correlation. Methods for calculating and interpreting the correlation coefficient are provided along with examples of exploring relationships between hydrological variables.
This document discusses stochastic methods in hydrology, specifically Markov transition matrices and cumulative distribution functions. It describes how to calculate daily monsoon rainfall using a Markov chain model with four rainfall classes. The initial condition and transition probabilities are given. It also discusses stationary time series, linear stochastic models including moving averages, autoregressive models and autoregressive moving average models. Double moving averages are presented to remove trends and improve forecasts.
This document discusses probability distributions and their applications in statistical hydrology. It begins by explaining discrete and continuous random variables and their probability functions. It then covers several specific probability distributions including binomial, Poisson, normal, lognormal, gamma, exponential and Gumbel distributions. Examples are provided to illustrate how these distributions can be used to calculate probabilities of hydrologic events like floods or rainfall.
This document discusses multiple linear regression techniques. It begins by explaining that multiple linear regression is used to predict a dependent variable from a set of independent variables. It then provides details on assumptions that must be satisfied, how to identify and handle outliers, and the steps involved in performing multiple linear regression analysis. Examples are also provided to illustrate key concepts.
This document discusses regression analysis and its application in hydrology. It begins by defining regression as a statistical technique used to determine the functional relationship between two variables. Simple linear regression finds the best fit linear equation to describe the relationship between a dependent and independent variable. Regression can be used to predict outcomes, describe relationships, and control for variables. The document provides examples of applying regression to predict erosion based on wave height data. It explains how to calculate the regression equation and error term.
This document discusses trend analysis of time series data. It defines time series as measurements of a variable taken at regular intervals over time. Time series can show trends, seasonal variations, cyclical variations, and irregular variations. Trend analysis determines if there is a significant increasing or decreasing trend in the data over time. Linear regression and non-parametric Mann-Kendall tests are common methods used to test for trends and estimate their magnitude. The selection of an appropriate trend analysis method depends on characteristics of the water resources data such as distributions, outliers, and missing values.
This document discusses statistical hydrology and summarizing data. It describes defining problems, collecting relevant data through sampling techniques, and assessing data quality before analysis. Statistical hydrology involves collecting and analyzing variable, limited water resources data to make decisions and scientific discoveries. Descriptive statistics are used to summarize datasets while inferential statistics enable inferences about unknown aspects.
The document discusses various statistical hypothesis tests that can be used to analyze hydrological data, including the t-test and ANOVA. It provides examples of how to set up null and alternative hypotheses, calculate relevant statistics like t-statistics and F-statistics, and make decisions about whether to reject the null hypothesis based on comparing these statistics to critical values. One example analyzes groundwater depth data from three catchments using ANOVA to test if depths differ between catchments.
This document discusses statistical methods for simple linear regression including tests of significance for the slope and intercept. It introduces alternative regression methods such as the Kendall-Theil robust line that can be used when the assumptions of ordinary least squares regression are not met, such as when the residuals are not normally distributed. An example demonstrates how to calculate the Kendall-Theil robust line and test its significance.
This document discusses stochastic methods in hydrology, specifically Markov transition matrices and cumulative distribution functions. It describes how to calculate daily monsoon rainfall using a Markov chain model with four rainfall classes. The initial condition and transition probabilities are given. It also discusses stationary time series, linear stochastic models including moving averages, autoregressive models and autoregressive moving average models. Double moving averages are presented to remove trends and improve forecasts.
This document discusses probability distributions and their applications in statistical hydrology. It begins by explaining discrete and continuous random variables and their probability functions. It then covers several specific probability distributions including binomial, Poisson, normal, lognormal, gamma, exponential and Gumbel distributions. Examples are provided to illustrate how these distributions can be used to calculate probabilities of hydrologic events like floods or rainfall.
This document discusses multiple linear regression techniques. It begins by explaining that multiple linear regression is used to predict a dependent variable from a set of independent variables. It then provides details on assumptions that must be satisfied, how to identify and handle outliers, and the steps involved in performing multiple linear regression analysis. Examples are also provided to illustrate key concepts.
This document discusses regression analysis and its application in hydrology. It begins by defining regression as a statistical technique used to determine the functional relationship between two variables. Simple linear regression finds the best fit linear equation to describe the relationship between a dependent and independent variable. Regression can be used to predict outcomes, describe relationships, and control for variables. The document provides examples of applying regression to predict erosion based on wave height data. It explains how to calculate the regression equation and error term.
This document discusses trend analysis of time series data. It defines time series as measurements of a variable taken at regular intervals over time. Time series can show trends, seasonal variations, cyclical variations, and irregular variations. Trend analysis determines if there is a significant increasing or decreasing trend in the data over time. Linear regression and non-parametric Mann-Kendall tests are common methods used to test for trends and estimate their magnitude. The selection of an appropriate trend analysis method depends on characteristics of the water resources data such as distributions, outliers, and missing values.
This document discusses statistical hydrology and summarizing data. It describes defining problems, collecting relevant data through sampling techniques, and assessing data quality before analysis. Statistical hydrology involves collecting and analyzing variable, limited water resources data to make decisions and scientific discoveries. Descriptive statistics are used to summarize datasets while inferential statistics enable inferences about unknown aspects.
The document discusses various statistical hypothesis tests that can be used to analyze hydrological data, including the t-test and ANOVA. It provides examples of how to set up null and alternative hypotheses, calculate relevant statistics like t-statistics and F-statistics, and make decisions about whether to reject the null hypothesis based on comparing these statistics to critical values. One example analyzes groundwater depth data from three catchments using ANOVA to test if depths differ between catchments.
This document discusses statistical methods for simple linear regression including tests of significance for the slope and intercept. It introduces alternative regression methods such as the Kendall-Theil robust line that can be used when the assumptions of ordinary least squares regression are not met, such as when the residuals are not normally distributed. An example demonstrates how to calculate the Kendall-Theil robust line and test its significance.
The document discusses concepts related to statistical analysis of hydrological data, including measures of skewness, kurtosis, outliers, and the common characteristics of water resources data. Skewness measures asymmetry in a distribution, while kurtosis measures peakedness. Outliers are identified using methods like Chauvenet's criterion, Grubbs' test, and Dixon's Q test. Water resources data commonly has a lower bound of zero, outliers, non-normal distributions, positive skewness, seasonal patterns, and positive autocorrelation between consecutive observations.
Accelerating the production of safety summary and clinical safety reports - a...Steffan Stringer
This document discusses automating the production of safety summaries and clinical study reports using LaTeX, R, and source control. It proposes a workflow where clinical data is transformed into CDISC SDTM/ADaM formats using R, and reports are generated as reproducible documents combining LaTeX documentation with R code and output. This approach aims to reduce errors, accelerate delivery times, and allow easier collaboration between physicians, data managers, programmers and writers. The key benefits are presented as producing clinical reports as reproducible code and establishing a fully integrated process for analysis and reporting.
This document provides guidance on using regression analysis to validate hydrological data. It discusses using simple linear regression to establish relationships between variables like rainfall and runoff. Key steps covered include estimating regression coefficients to minimize the error variance, measuring the goodness of fit using the coefficient of determination, and examining residuals over time and versus other variables to evaluate changes in the rainfall-runoff relationship. The overall aim is to detect errors in discharge data by comparing observed and computed runoff derived from regression models.
This document provides an overview of key components and activities involved in air quality management systems. It describes common air quality management activities like goal setting, control strategies, modeling, assessment, legislation/regulation, compliance, and monitoring. The document also lists several quality management tools that can be used for air quality management, including check sheets, control charts, Pareto charts, scatter plots, Ishikawa diagrams, histograms, and their purposes. Links to additional air quality management resources are also provided.
This document provides information about quality management tools and techniques. It discusses a quality management institute database that was created to track clinical data for a sepsis initiative. It describes how the database tracked various metrics and underwent iterative improvements based on data analysis. Over 8,000 cases of sepsis were eventually entered into the database. Common quality management tools are also defined, including check sheets, control charts, Pareto charts, scatter plots, Ishikawa diagrams, and histograms.
The document discusses quality assurance management tools and strategies. It provides descriptions and examples of 5 commonly used quality management tools: check sheets, control charts, Pareto charts, scatter plots, and Ishikawa diagrams. Each tool is explained in 1-2 paragraphs detailing what it is used for and how it works. Examples are given for control charts, Pareto charts, and scatter plots. The tools can help identify issues, determine causes of problems, and monitor quality over time.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
The document discusses concepts related to statistical analysis of hydrological data, including measures of skewness, kurtosis, outliers, and the common characteristics of water resources data. Skewness measures asymmetry in a distribution, while kurtosis measures peakedness. Outliers are identified using methods like Chauvenet's criterion, Grubbs' test, and Dixon's Q test. Water resources data commonly has a lower bound of zero, outliers, non-normal distributions, positive skewness, seasonal patterns, and positive autocorrelation between consecutive observations.
Accelerating the production of safety summary and clinical safety reports - a...Steffan Stringer
This document discusses automating the production of safety summaries and clinical study reports using LaTeX, R, and source control. It proposes a workflow where clinical data is transformed into CDISC SDTM/ADaM formats using R, and reports are generated as reproducible documents combining LaTeX documentation with R code and output. This approach aims to reduce errors, accelerate delivery times, and allow easier collaboration between physicians, data managers, programmers and writers. The key benefits are presented as producing clinical reports as reproducible code and establishing a fully integrated process for analysis and reporting.
This document provides guidance on using regression analysis to validate hydrological data. It discusses using simple linear regression to establish relationships between variables like rainfall and runoff. Key steps covered include estimating regression coefficients to minimize the error variance, measuring the goodness of fit using the coefficient of determination, and examining residuals over time and versus other variables to evaluate changes in the rainfall-runoff relationship. The overall aim is to detect errors in discharge data by comparing observed and computed runoff derived from regression models.
This document provides an overview of key components and activities involved in air quality management systems. It describes common air quality management activities like goal setting, control strategies, modeling, assessment, legislation/regulation, compliance, and monitoring. The document also lists several quality management tools that can be used for air quality management, including check sheets, control charts, Pareto charts, scatter plots, Ishikawa diagrams, histograms, and their purposes. Links to additional air quality management resources are also provided.
This document provides information about quality management tools and techniques. It discusses a quality management institute database that was created to track clinical data for a sepsis initiative. It describes how the database tracked various metrics and underwent iterative improvements based on data analysis. Over 8,000 cases of sepsis were eventually entered into the database. Common quality management tools are also defined, including check sheets, control charts, Pareto charts, scatter plots, Ishikawa diagrams, and histograms.
The document discusses quality assurance management tools and strategies. It provides descriptions and examples of 5 commonly used quality management tools: check sheets, control charts, Pareto charts, scatter plots, and Ishikawa diagrams. Each tool is explained in 1-2 paragraphs detailing what it is used for and how it works. Examples are given for control charts, Pareto charts, and scatter plots. The tools can help identify issues, determine causes of problems, and monitor quality over time.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
1. MAL1303: Statistical Hydrology
Correlation
Dr. Shamsuddin Shahid
Department of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi Malaysia
Room No. M46-332; E-mail: sshahid@utm.my
Mobile: 0182051586
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
2. Research Questions: Are two variables related?
Example questions in hydrology:
– “Is there any relation between rainfall and river
discharge?”
– “Is there any relation between low river flow and river
water quality?”
– “Is there any relation between elevation and rainfall?”
– “Is there any relation between rainfall intensity and
landslides?
Test the relationship: Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
3. Correlation
Definition: Correlation is a statistical method that is used to
examine the extent to which two variables have a simple linear
relationship.
Questions:
What does it mean to say that two variables are associated with
one another?
How can we mathematically formalize the concept of
association?
Answer:
Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
4. Correlation gives relationship between two variables:
– direction
– Strength
– Significance
Sign indicates direction
Size indicates strength
Comparison with critical values gives significance
Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
5. Scatter Plots
• Plot each pair of observations (X, Y)
• x = predictor variable (independent)
• y = criterion variable (dependent)
• Check for:
– outliers
– linearity
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
6. How do you study the relationship between two variables?
Groundwater temperature data are collected at different depth from the earth
surface.
A list of these data is difficult to understand.
The relationship between the two variables can be visualized using a scatter
diagram, where each pair depth-temperature is represented as a point in a
plane.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
7. Types of Correlation
Correlation
Positive Correlation Negative Correlation
Positive Correlation: The correlation is said to be positive correlation if
the values of two variables changing with same direction.
Negative Correlation: The correlation is said to be negative correlation
when the values of variables change with opposite direction.
Type I
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
8. Positive & Negative Association
At each depth two data are collected: Temperature and Nitrogen Concentration.
We obtained two scatter plot:
(i) Depth vs. Groundwater Temperature;
(ii) Depth vs. Nitrogen Concentration in Groundwater.
In the first graph, it is observed that temperature is increasing with depth, as a
general tendency. This corresponds to a positive association.
In the second graph, Nitrogen concentration decreasing with depth. This
corresponds to a negative association.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
9. Types of Correlation
Correlation
Simple Multiple
Partial Total
Type II
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
10. Types of Correlation Type II
• Simple correlation: Under simple correlation problem there
are only two variables are studied.
• Multiple Correlation: Under Multiple Correlation three or
more than three variables are studied.
• Partial correlation: analysis recognizes more than two
variables but considers only two variables keeping the other
constant.
• Total correlation: is based on all the relevant variables, which
is normally not feasible.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11. Types of Correlation
Correlation
LINEAR NON LINEAR
Type III
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
12. Types of Correlation Type III
• Linear correlation: Correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio to the amount of
change in the other. The graph of the variables having a linear relationship
will form a straight line.
• Non Linear correlation: The correlation would be non linear if the amount of
change in one variable does not bear a constant ratio to the amount of
change in the other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
13. Correlation Coefficient
The correlation coefficient gives a measure of the linear association
of two variables. It defines the degree of relationship.
The correlation coefficient is usually denoted by r and takes values
between -1 and 1.
r is positive; between 0 and 1 r is negative; between 0 and -1
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
14. Correlation Coefficient
Nitrogen concentration Data are collected at two different locations and
obtained two plots given below. Both show negative correlation between depth
and Nitrogen concentration. Correlation coefficient, r will be more negative in
case of first plot compared to second plot.
If the scatter plot of the two variables is very close to the straight line we have a
correlation that is close to one. A near zero correlation corresponds to a diagram
where the data are widely scattered around the line.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
15. 11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
16. Correlation Coefficient - Summary
A positive coefficient means that the data are clustered around lines with a
positive slope. That is, as one variable increases, the other one also
increases.
A negative coefficient means that the data are clustered around lines with a
negative slope. That is, as one variable increases, the other one decreases.
The closer r is to 1 the stronger the positive linear association between the
variables.
The closer r is to -1 the stronger the negative linear association between the
variables.
When r is equal to or near to 1 or -1 there is a linear association between
the variables.
When r is equal to or near to 0, there no association between the variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
17. Pearson Correlation
Pearson correlation is used to describe relationship between
two variables that are both interval and ration variables.
Pearson correlation compares how consistently each Y value is
paired with each X value in a linear fashion
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
18. Covariance
• covariance is a measure of how much two variables change together.
• Variance shared by 2 variables
• Covariance reflects the direction of the relationship:
Positive covariance indicates + relationship
Negative covariance indicates - relationship
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
19. Computational Formula
Sum of Squares (SS) measures the amount of variation or variability of
a single variable.
Sum of Products (SP) provides a parallel procedure for measuring the
amount of covariation or covariability between two variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
20. Calculation of Pearson’s Correlation Coefficient
Pearson’s correlation coefficient is a ratio comparing the
covariability of X and Y with variability of X and Y separately.
SP measures the covariability of X and Y
The variability of X and Y is measured by calculating the SS for X
and Y scores separately
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
21. Calculation of Pearson’s Correlation Coefficient
Let, X represent Depth in feet and Y represent Nitrate Concentration in
mg/l. The association between Groundwater Depth and Nitrate
Concentration can be found as below:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
22. Hypothesis Testing
H0 : there is no correlation between depth and nitrate concentration or the
population correlation is 0.
H1: there is a real non-zero correlation in the population.
Population correlation is traditionally represented by , therefore, with
symbol we can write,
H0 : = 0
H1: ≠ 0
For the pearson’s correlation, Degree of Freedom df = n-2. Where n is the
sample size. We lose 2 degree of freedoms because we need to estimate two
means, one for each variance estimate.
If the calculated r is equal to or exceeds the critical value (given in Table) then
obtained r is significant.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
23. Hypothesis Testing
In the present case, r = 0.875
df = n-2
= 5-2
= 3
Critical value for α = 0.05, df = 3 is 0.878.
Therefore, we accept H0 : = 0
There is no correlation between the populations
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
24. Significance of Correlation
Df Critical Value
(N-2) p = .05
5 .67
10 .50
15 .41
20 .36
25 .32
30 .30
50 .23
200 .11
500 .07
1000 .05
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
25. Correlation: r & r2
As a matter of routine it is the squared correlations
that should be interpreted. This is because the
correlation coefficient is misleading in suggesting
the existence of more covariation than exists, and
this problem gets worse as the correlation
approaches zero.
Note that as the correlation r decrease by tenths,
the r2 decreases by much more. A correlation of .50
only shows that 25 percent variance is in common;
a correlation of .20 shows 4 percent in common;
and a correlation of .10 shows 1 percent in common
(or 99 percent not in common).
Thus, squaring should be a healthy corrective to the
tendency to consider low correlations, such as .20
and .30, as indicating a meaningful or practical
covariation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
26. Assumptions
• Scale of measurement is interval
• Linear relationships
• Homoscedasticity
• Similar normal underlying distributions
• No outliers
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
28. Advantages and Disadvanateges of Pearson’s Coefficient
Advantages
• It summarizes in one value, the degree of correlation &
direction of correlation also.
Limitations
• Always assume linear relationship
• Interpreting the value of r is difficult.
• Value of Correlation Coefficient is affected by the extreme
values.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
29. Parametric and Non-parametric Correlation
Parametric correlation:
when distribution of data is normal.
Example: Pearson Correlation
Non-parametric correlation:
when distribution of data is not normal
Example: Spearman’s Rank Correlation, Kendall- Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
30. The Spearman Correlation
Spearman’s correlation is designed to measure the relationship between
variables measured on an ordinal scale of measurement
A perfectly positive relationship means that every time X increases Y also
increases; i.e., the smallest value of X is paired with the smallest value of
Y and so on
The original scores are first converted to ranks, then the Spearman
correlation coefficient is used to measure the relationship for the ranks.
The degree of relationship for the ranks provides a measure of the
degree of consistency for the original scores.
Calculation of Spearman’s Correlation Coefficient
Be sure you have ordinal data for X and Y scores
The smallest value gets the rank 1 and the second smallest 2 and so on
Rank X and Y separately
Use the same formula on the ranked data as you used for Pearson’s r
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
31. Rank Correlation
• Spearman Rank-Correlation Coefficient, rs
where: n = number of items being ranked
xi = rank of item i with respect to one variable
yi = rank of item i with respect to a second
variable
di = xi - yi
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
32. Test for Significant Rank Correlation
• We may want to use sample results to make an inference
about the population rank correlation ps.
• To do so, we must test the hypotheses:
H0: ps = 0
Ha: ps 0
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
33. Spearman Rank Correlation
Monthly Rainfall (mm): Sample-1: {79, 71, 108, 54, 67, 90}
Monthly Discharge (cusec): Sample 2: {122, 100, 121, 43, 54, 80}
If rs > Critical value
There is a significant
correlation
Null Hypothesis:
There exists no association
(or correlation) between
the samples
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
34. 11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
35. Merits Spearman’s Rank Correlation
• This method is simpler to understand and easier to apply
compared to karl pearson’s correlation method.
• This method is useful where we can give the ranks and
not the actual data. (qualitative term)
• This method is to use where the initial data in the form
of ranks.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
36. Limitation Spearman’s Correlation
• Cannot be used for finding out correlation in a grouped
frequency distribution.
• This method should be applied where N exceeds 30.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
37. Kendall's rank correlation provides a distribution free test of
independence and a measure of the strength of dependence
between two variables.
Spearman's rank correlation is satisfactory for testing a null
hypothesis of independence between two variables but Kendall's
rank correlation is much powerful.
Kendall-tau Rank Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
38. Steps for Kendall-tau Rank Correlation
1. Arrange the data in increasing order of magnitude of the first
variable and label the objects with the resulting rank: 1 for the
smallest up to N for the largest.
2. Rearrange the data in order of increasing magnitude of the
second variable and record the rearranged order of the variable-
1 ranks
3. For each data, scan down variable-2, counting the number of
ranks that are larger.
4. Repeat the step(3), this time counting the number of ranks that
are smaller.
5. Subtract “smaller” from “larger” and sum the total (S).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
39. 6. Kendall’s is given by:
= (2 x S) / [N x (N-1)]
7. Computer z-statistics as
z = x [9 x N x (N-1)] / [2 x (2N + 5)]
8. Null hypothesis rejected if z is out of the following range:
-1.96 < z > 1.96
Steps for Kendall-tau Rank Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
40. Kendall-tau Rank Correlation
Problem: Ten groundwater samples
are collected from different points
to see is there any relation between
groundwater depth and
contamination. Data are given in
the table. Is there any association
between depth and contamination.
Null Hypothesis: There exists no
association. Contamination is
independent of Groundwater
Depth.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
41. Kendall-tau Rank Correlation
Step-1: Rank the data
separately
Step-2: Re-arrange the
second ranks according
the rank of first variable
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
42. Kendall-tau Rank Correlation
= (2 x S) / [N x (N-1)]
z = x [9 x N x (N-1)] / [2 x (2N + 5)]
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
43. Kendall-tau Rank Correlation
Null Hypothesis:
There exist no relation between depth and contamination
Null hypothesis rejected (p=0.05) if z is out of the following range:
-1.96 < z > 1.96
z (calculated) = 3.67
z(calculated) > z (critical), therefore null hypothesis rejected.
Decision: There exist significant correlation between depth and
groundwater contamination
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
44. Features of Correlation Coefficient
The correlation coefficient has the following properties:
The correlation is not affected when the two variables are
interchanged.
The correlation is not changed if the same number is added to all
the values of one of the variables.
The correlation is not changed if all the values of one of the
variables is multiplied by the same positive number. It will change
sign if the number is negative.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
45. Factors affect correlation
• Restricted range
• Heterogenous samples
• Outliers
• Scale
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
46. Range restriction
• Range restriction is when sample contains restricted (or
truncated) range of scores
– e.g., Groundwater Recharge and Rainfall > 5mm
• If range restriction, be cautious in generalising beyond
the range for which data is available
– e.g., Groundwater recharge less when rainfall is less, but below
a threshold level, there is no relation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
47. Range restriction
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
48. Heterogenous samples
• Sub-samples may
artificially increase or
decrease overall r.
• Solution - calculate r
separately for sub-
samples and overall,
look for differences
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
49. Heterogenous samples
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
50. Effect of Outliers
• Outliers can disproportionately increase or decrease r.
• Options
– compute r with & without outliers
– get more data for outlying values
– recode outliers as having more conservative scores
– transformation
– recode variable into lower level of measurement
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
51. Effect of Outliers
Outliers can disproportionately
increase or decrease r
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
52. Closed Data
Sometimes, closed data or some discrete data shows high
correlation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
53. Log Transformed Data
If data is transformed to log scale, then relation between log data
shows high correlation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
54. Checklist
1. Graphs & Scatterplots
– Outliers?
– Linear?
– Does each variable have a reasonable range?
– Are there subsamples to consider?
2. Choose appropriate measure of Association
3. Conduct inferential test
4. Interpret/Discuss
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
55. Association and Causation
ASSOCIATION
• If two attributes say A and B are found to co-exit more often
than an ordinary chance. Then they are correlated. We can
say that there is an association between attributes A and B.
• Correlation indicates the degree of association between two
variables.
CAUSATION
If one of these attributes say A is the suspected cause and the
other say B is the outcome then we have a reason to suspect
that A has caused B.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
56. Association and Causation
• Association does not mean causation.
• If association is consistence, then there may be
causation.
• If a relationship is causal, the findings should be
consistent with other data
• Causation always implies correlation but correlation
does not necessarily implies causation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
57. Reporting
• State the research hypothesis
• Describe & interpret correlation
– direction of relationship
– size/strength of relationship
– Significance of relationship
• Acknowledge limitations e.g.,
– Heterogeneity (sub-samples)
– Range restriction
– Causality?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
58. Partial Correlation
River discharge depends on many factors, such as rainfall, soil
property, evapotranspiration, groundwater storage, etc. Each
independent factors are also correlated with each other.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
59. Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
60. Three (or more) Variables
• Three variables means three relationships
• Each can effect the other two
• Partial & semi-partial correlation—remove contributions of 3rd variable
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
61. • Sometimes it is desirable to know the relationship between two
variables with the effects of a third variable held constant. We
can do it by using Partial correlation
• It helps us to find the ‘pure’ correlation between two variable with
holding the others constant.
• ‘Holding constant’ in this situation is known as partialling out, and
the technique for partialling out the effects of one or more
variables from two others, in order to find the relationship
between them is called partial correlation.
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
62. A partial correlation is a correlation between two variables from
which the linear relations, or effects, of another variable(s) have
been removed.
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
63. Partial Correlation
Correlation = 0.72
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
64. Partial Correlation
Correlation = 0.7311/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
65. Higher-Order Partial Correlation
The second-order partial correlation is the correlation between two
variables with the effects of two other variables being removed.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
66. With partial correlation, we find the correlation between X and Y
holding Z constant for both X and Y. Sometimes, however, we want
to hold Z constant for just X or just Y. In that case, we compute a
semipartial correlation.
Semipartial Correlation
Comparison between the partial and semipartial correlation:
Partial:
Semi-partial:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
67. Partial Correlation
The result doesn't make much
intuitive sense, but it does remind us
that the absolute value of the partial
is larger than the semipartial.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
68. • The partial and semipartial correlation formulas are the
same in the numerator and almost the same in the
denominator.
• The partial contains something extra, that is, something
missing from the semipartial correlation in the
denominator.
• This means that the partial correlation is going to be
larger in absolute value than the semipartial.
• This will be true except when the controlling or partialling
variable is uncorrelated with the variable to be controlled.
Semipartial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
69. Advantages of Correlation studies
• Show the amount (strength) of relationship present
• Can be used to make predictions about the variables
under study.
• Can be used in many places, including natural settings,
libraries, etc.
• Easier to collect co relational data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
70. Disadvantages of correlation studies
• Can’t assume that a cause-effect relationship exists
• Little or no control (experimental manipulation) of the
variables is possible
• Relationships may be accidental or due to a third,
unmeasured factor common to the 2 variables that are
measured
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)