Correlation measures the strength and direction of association between two variables. Positive correlation means both variables increase or decrease together, while negative correlation means one variable increases as the other decreases. Correlation does not imply causation. The correlation coefficient r ranges from -1 to 1, where -1 is total negative correlation, 0 is no correlation, and 1 is total positive correlation. Common types of correlation coefficients include Pearson's correlation coefficient, used with normally distributed interval or ratio data, and Spearman's rank correlation coefficient, used with ordinal or non-normally distributed data. Regression analysis can be used to predict the value of a dependent variable from the value of an independent variable when they are linearly correlated.
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
It is most useful for the students of BBA for the subject of "Data Analysis and Modeling"/
It has covered the content of chapter- Data regression Model
Visit for more on www.ramkumarshah.com.np/
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
It is most useful for the students of BBA for the subject of "Data Analysis and Modeling"/
It has covered the content of chapter- Data regression Model
Visit for more on www.ramkumarshah.com.np/
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Assessment 2 ContextIn many data analyses, it is desirable.docxfestockton
Assessment 2 Context
In many data analyses, it is desirable to compute a coefficient of association. Coefficients of association are quantitative measures of the amount of relationship between two variables. Ultimately, most techniques can be reduced to a coefficient of association and expressed as the amount of relationship between the variables in the analysis. There are many types of coefficients of association. They express the mathematical association in different ways, usually based on assumptions about the data. The most common coefficient of association you will encounter is the Pearson product-moment correlation coefficient (symbolized as the italicized r), and it is the only coefficient of association that can safely be referred to as simply the "correlation coefficient". It is common enough so that if no other information is provided, it is reasonable to assume that is what is meant.
Correlation coefficients are numbers that give information about the strength of relationship between two variables, such as two different test scores from a sample of participants. The coefficient ranges from -1 through +1. Coefficients between 0 and +1 indicate a positive relationship between the two scores, such as high scores on one test tending to come from people with high scores on the second. The other possible relationship, which is every bit as useful, is a negative correlation between -1 and 0. A negative correlation possesses no less predictive power between the two scores. The difference is that high scores on one measure are associated with low scores on the other.
An example of the kinds of measures that might correlate negatively is absences and grades. People with higher absences will be expected to have lower grades. When a correlation is said to be significant, it can be shown that the correlation is significantly different form zero in the population. A correlation of zero means no relationship between variables. A correlation other than zero means the variables are related. As the coefficient gets further from zero (toward +1 or -1), the relationship becomes stronger.Interpreting Correlation: Magnitude and Sign
Interpreting a Pearson's correlation coefficient (rXY) requires an understanding of two concepts:
· Magnitude.
· Sign (+/-).
The magnitude refers to the strength of the linear relationship between Variable X and Variable
The rXY ranges in values from -1.00 to +1.00. To determine magnitude, ignore the sign of the correlation, and the absolute value of rXY indicates the extent to which Variable X and Variable Y are linearly related. For correlations close to 0, there is no linear relationship. As the correlation approaches either -1.00 or +1.00, the magnitude of the correlation increases. Therefore, for example, the magnitude of r = -.65 is greater than the magnitude of r = +.25 (|.65| > |.25|).
In contrast to magnitude, the sign of a non-zero correlation is either negative or positive.
These labels are not interpreted ...
Assessment 2 ContextIn many data analyses, it is desirable.docxgalerussel59292
Assessment 2 Context
In many data analyses, it is desirable to compute a coefficient of association. Coefficients of association are quantitative measures of the amount of relationship between two variables. Ultimately, most techniques can be reduced to a coefficient of association and expressed as the amount of relationship between the variables in the analysis. There are many types of coefficients of association. They express the mathematical association in different ways, usually based on assumptions about the data. The most common coefficient of association you will encounter is the Pearson product-moment correlation coefficient (symbolized as the italicized r), and it is the only coefficient of association that can safely be referred to as simply the "correlation coefficient". It is common enough so that if no other information is provided, it is reasonable to assume that is what is meant.
Correlation coefficients are numbers that give information about the strength of relationship between two variables, such as two different test scores from a sample of participants. The coefficient ranges from -1 through +1. Coefficients between 0 and +1 indicate a positive relationship between the two scores, such as high scores on one test tending to come from people with high scores on the second. The other possible relationship, which is every bit as useful, is a negative correlation between -1 and 0. A negative correlation possesses no less predictive power between the two scores. The difference is that high scores on one measure are associated with low scores on the other.
An example of the kinds of measures that might correlate negatively is absences and grades. People with higher absences will be expected to have lower grades. When a correlation is said to be significant, it can be shown that the correlation is significantly different form zero in the population. A correlation of zero means no relationship between variables. A correlation other than zero means the variables are related. As the coefficient gets further from zero (toward +1 or -1), the relationship becomes stronger.Interpreting Correlation: Magnitude and Sign
Interpreting a Pearson's correlation coefficient (rXY) requires an understanding of two concepts:
· Magnitude.
· Sign (+/-).
The magnitude refers to the strength of the linear relationship between Variable X and Variable
The rXY ranges in values from -1.00 to +1.00. To determine magnitude, ignore the sign of the correlation, and the absolute value of rXY indicates the extent to which Variable X and Variable Y are linearly related. For correlations close to 0, there is no linear relationship. As the correlation approaches either -1.00 or +1.00, the magnitude of the correlation increases. Therefore, for example, the magnitude of r = -.65 is greater than the magnitude of r = +.25 (|.65| > |.25|).
In contrast to magnitude, the sign of a non-zero correlation is either negative or positive.
These labels are not interpreted .
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Assessment 2 ContextIn many data analyses, it is desirable.docxfestockton
Assessment 2 Context
In many data analyses, it is desirable to compute a coefficient of association. Coefficients of association are quantitative measures of the amount of relationship between two variables. Ultimately, most techniques can be reduced to a coefficient of association and expressed as the amount of relationship between the variables in the analysis. There are many types of coefficients of association. They express the mathematical association in different ways, usually based on assumptions about the data. The most common coefficient of association you will encounter is the Pearson product-moment correlation coefficient (symbolized as the italicized r), and it is the only coefficient of association that can safely be referred to as simply the "correlation coefficient". It is common enough so that if no other information is provided, it is reasonable to assume that is what is meant.
Correlation coefficients are numbers that give information about the strength of relationship between two variables, such as two different test scores from a sample of participants. The coefficient ranges from -1 through +1. Coefficients between 0 and +1 indicate a positive relationship between the two scores, such as high scores on one test tending to come from people with high scores on the second. The other possible relationship, which is every bit as useful, is a negative correlation between -1 and 0. A negative correlation possesses no less predictive power between the two scores. The difference is that high scores on one measure are associated with low scores on the other.
An example of the kinds of measures that might correlate negatively is absences and grades. People with higher absences will be expected to have lower grades. When a correlation is said to be significant, it can be shown that the correlation is significantly different form zero in the population. A correlation of zero means no relationship between variables. A correlation other than zero means the variables are related. As the coefficient gets further from zero (toward +1 or -1), the relationship becomes stronger.Interpreting Correlation: Magnitude and Sign
Interpreting a Pearson's correlation coefficient (rXY) requires an understanding of two concepts:
· Magnitude.
· Sign (+/-).
The magnitude refers to the strength of the linear relationship between Variable X and Variable
The rXY ranges in values from -1.00 to +1.00. To determine magnitude, ignore the sign of the correlation, and the absolute value of rXY indicates the extent to which Variable X and Variable Y are linearly related. For correlations close to 0, there is no linear relationship. As the correlation approaches either -1.00 or +1.00, the magnitude of the correlation increases. Therefore, for example, the magnitude of r = -.65 is greater than the magnitude of r = +.25 (|.65| > |.25|).
In contrast to magnitude, the sign of a non-zero correlation is either negative or positive.
These labels are not interpreted ...
Assessment 2 ContextIn many data analyses, it is desirable.docxgalerussel59292
Assessment 2 Context
In many data analyses, it is desirable to compute a coefficient of association. Coefficients of association are quantitative measures of the amount of relationship between two variables. Ultimately, most techniques can be reduced to a coefficient of association and expressed as the amount of relationship between the variables in the analysis. There are many types of coefficients of association. They express the mathematical association in different ways, usually based on assumptions about the data. The most common coefficient of association you will encounter is the Pearson product-moment correlation coefficient (symbolized as the italicized r), and it is the only coefficient of association that can safely be referred to as simply the "correlation coefficient". It is common enough so that if no other information is provided, it is reasonable to assume that is what is meant.
Correlation coefficients are numbers that give information about the strength of relationship between two variables, such as two different test scores from a sample of participants. The coefficient ranges from -1 through +1. Coefficients between 0 and +1 indicate a positive relationship between the two scores, such as high scores on one test tending to come from people with high scores on the second. The other possible relationship, which is every bit as useful, is a negative correlation between -1 and 0. A negative correlation possesses no less predictive power between the two scores. The difference is that high scores on one measure are associated with low scores on the other.
An example of the kinds of measures that might correlate negatively is absences and grades. People with higher absences will be expected to have lower grades. When a correlation is said to be significant, it can be shown that the correlation is significantly different form zero in the population. A correlation of zero means no relationship between variables. A correlation other than zero means the variables are related. As the coefficient gets further from zero (toward +1 or -1), the relationship becomes stronger.Interpreting Correlation: Magnitude and Sign
Interpreting a Pearson's correlation coefficient (rXY) requires an understanding of two concepts:
· Magnitude.
· Sign (+/-).
The magnitude refers to the strength of the linear relationship between Variable X and Variable
The rXY ranges in values from -1.00 to +1.00. To determine magnitude, ignore the sign of the correlation, and the absolute value of rXY indicates the extent to which Variable X and Variable Y are linearly related. For correlations close to 0, there is no linear relationship. As the correlation approaches either -1.00 or +1.00, the magnitude of the correlation increases. Therefore, for example, the magnitude of r = -.65 is greater than the magnitude of r = +.25 (|.65| > |.25|).
In contrast to magnitude, the sign of a non-zero correlation is either negative or positive.
These labels are not interpreted .
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
1. 13
Correlation co-efficient (r test)
CORRELATION
[Q:
Define correlation. (BSMMU, MD Radiology, January 2010,
July 2009)
Short note: Correlation & regression (BSMMU, MD
Radiology, January, 2009)]
In statistics, the word correlation refers to the relationship between
two variables. If the change in one variable effects a change in the
other variable, the variables are said to be correlated.
Sometimes two continuous characters are measured in the same
person, such as weight and cholesterol, weight and height etc. At
other times, the same character is measured in two related groups
such as tallness in parents and tallness in children, study of
intelligent quotient (IQ) in brothers and in corresponding sisters
(siblings) and so on. The relationship or association between two
quantitatively measured or continuous variables is called
correlation.
Remember, correlation does not imply causation.
The relationship between two random variables is known as a
bivariate relationship. The known variable (or variables) is called
the independent variable(s). The variable we are trying to predict
is the dependent variable.
Example: A medical researcher may be interested in the bivariate
relationship between a patient’s blood pressure x and heart rate y.
Here x is independent variable and y is dependent variable.
Type of correlation
[Q:
2. Biostatistics-126
Discuss different types of correlation with figures.
(BSMMU, MD Radiology, January, 2010)
Classify correlation with figures of each. (BSMMU, MD
Radiology, July, 2009)]
1. Positive correlation:
If the movements of the variables are in the same direction,
the correlation is called positive correlation.
In positive correlation, the two variables react in the same
way, increasing or decreasing together.
Example:
a. Height and weight of a group of people are positively
correlated
b. Temperatures in Celsius and Fahrenheit have a positive
correlation.
In perfect positive correlation, coefficient of Correlation (r) =
+1, and in moderately positive correlation 0 < r <1.
2. Negative correlation:
If the movements of the variables are in the opposite
direction, the correlation is called negative correlation.
In negative correlation, as one variable increases, the other
decreases.
Example: One variable might be the number of hunters in a
region and the other variable could be the deer population.
Perhaps as the number of hunters increases, the deer
population decreases. This is an example of a negative
correlation
In perfect negative correlation, coefficient of correlation (r)
= -1, and in moderately negative correlation -1 < r < 0.
3. Zero correlation:
If the movements of the one variable do not effect the
movement of the other variable, the variables are not
correlated and defined as zero correlation.
In zero correlation, coefficient of correlation (r) = 0.
3. Biostatistics-127
Correlation in brief
When the value of one variable is related to the value of
another, they are said to be correlated
Coefficient of Correlation (r) measures such a relationship
The value of r ranges from -1 (perfectly correlated in the
negative direction) to +1 (perfectly correlated in the positive
direction)
4. Biostatistics-128
When r = 0, the 2 variables are not correlated
How can you tell if there is a correlation?
By observing the graphs, a person can tell if there is a correlation
by how closely the data resemble a line. If the points are scattered
about then there is may be no correlation. If the points would
closely fit a quadratic or exponential equation, etc., then they have
a nonlinear correlation.
How can you tell by inspection the type of correlation?
If the graph of the variables represent a line with positive slope,
then there is a positive correlation (x increases as y increases). If
the slope of the line is negative, then there is a negative
correlation (as x increases y decreases).
Correlation coefficient
Write short note on: Coefficient correlation, (BSMMU, MD
Radiology, January, 2010)
An important aspect of correlation is how strong it is.
The extent or degree of relationship between two sets of figures is
measured in terms the parameter called correlation coefficient. It
is denoted by letter ‘r’.
Another name for r is the Pearson product moment correlation
coefficient in honor of Karl Pearson who developed it about 1900.
When two variable characters in the same series or individuals are
measurable in quantitative units such as height and weight;
temperature and pulse rate; age and vital capacity; circulating
proteins in grams and surface area in square meters; systolic and
diastolic blood pressure in mm of Hg. it is often necessary and
possible to know, not only whether there is any association or
relationship between them or not but also the degree or extent of
such relationship.
5. Biostatistics-129
Correlation co-efficient (r) test
Measures of relationship between two group variables when one is
dependent to another.
Formula
2 2
( )( )
=
( ) ( )
sum x x y y
r
sum x x sum y y
- -
- -
2 2
=
sum XY
or r
sum X sumY
When =
X x x
-
=
Y y y
-
d.f = (n1-1) + (n2-1)
When; x=one variable
Y= other variable
Example
Problem: Find out the correlation co efficient between the
following variables.
x variable: 5, 8, 12, 15.
y variable: 20,25, 28, 30.
Solution:
Following table shows the relationship between the above
variables.
x y x x
-
= X
y y
-
= Y
X2
Y2
XY r
5 20 -5 -
5.75
25 33.06 28.75 0.959
8 25 -2 -
0.75
4 0.56 1.50
12 28 2 2.25 4 5.06 4.50
15 30 5 4.25 25 18.06 20.25
x = y = Sum
X2
=58
Sum
Y2
=56.68
Sum XY
=55
6. Biostatistics-130
10 25.75
2 2
=
sum XY
r
sum X sumY
55 55
, = = = 0.959
57.336
3287.44
or r
d.f = (n1-1) + (n2-1)
= 6
r = 0.959 means strong correlation
p value at 6 d.f <0.001
null hypothesis rejected.
Strength of Correlation:
Correlation coefficient degree of association
.8 to 1 Strong
.5 to .79 moderate
.2 to .49 weak
0to .19 negligible
1. Strong positive correlation …….When `r`=0.99-0.80 i. e
>.8
2. Moderate positive correlation……When `r` = 0.79-0.70
3. Limited degree correlation…. When ‘r` = 069 - 0.50
4. No correlation or zero correlation……...When `r`= <0.5
5. Negative correlation….When `r`=.1
N. B: Extent of correlation varies between minus one and plus one
I.e. - 1< r < l.
Problem for practice
During a laboratory experiment muscular contractions of a frog
muscle were measured against different doses of a given drug. The
height of the curve was considered as the response to the drug.
The observations were as below.
7. Biostatistics-131
Serial Number of experiment
1 2 3 4 5
Dose of
drug
0.3 0.4 0.6 0.8 0.9
Response to
drug
54.0. 59.0 60.0 65.0 70.0
From the above data calculate correlation coefficient and its
significance.
[Answer: r =0.9633 p <0.01]
[Q:
Calculate the person's correlation coefficient between X
and Y variables are given below :
X = 5, 7, 10, 12 Y = 4, 6, 9, 11
(BSMMU, MD Radiology, January, 2010)
Find-out the correlation coefficient between the following
2 variable. (BSMMU, MD Radiology, January, 2009)
Variable - I (X-
variable)
10, 15, 20, 25
(n=4)
Variable-II (Y-
variable)
30, 35, 40, 45
(n=4)
The length & weight of 7 mouse are given below.
Compute 'r' and test for its significance.
Length = 2, 5, 8, 12, 14, 19, 22.
Weight = 1, 4, 3, 4, 8, 9, 8
(BSMMU, MD Radiology, January, 2010)
What Is Rank Correlation?
8. Biostatistics-132
Consider a situation where the data does not contain precise
sample values so that a measure of precision is unattainable. In
this situation, the data may be ranked (as in the GPA system,
different range of marks are ranked as different grade) in order of
size, importance, etc., using the numbers 1, 2,...,n. These statistics
are called rank-order statistics or correlations. Rank correlation
is used when the data is not presented in precise sample values.
What is the coefficient of rank correlation?
Example:
Given an example where the data values x and y are organized in
order of size. Now, the correlation coefficient can be computed
for the given numerical values which are in the form of ranks. This
coefficient of rank correlation is denoted by rrank or briefly r and
is calculated by the equation,
Where
d = differences between ranks of corresponding x and y
x = number of pairs of values (x, y) in the data
The above equation is called as the SPEARMAN'S FORMULA FOR
RANK CORRELATION.
Example: A group of 5 Army officers have participated in the
competition of both SWIMMING and RUNNING. The following
table depicts the ranks, which is in accordance with the
achievements in both the tests. This table also depicts the
difference between the ranks and the square of those differences.
OFFICER RUNNING (x) SWIMMING (y) Di Di
2
Selim 5 3 2 4
9. Biostatistics-133
Habib 2 1 1 1
Ismail 4 5 1 1
Tauhid 1 2 1 1
Mesbah 3 4 1 1
From the above table we have,
Spearman's Rank Correlation is a technique used to test
the direction and strength of the relationship between two
variables. In other words, its a device to show whether any
one set of numbers has an effect on another set of
numbers.
It uses the statistic rrank (Rs) which falls between -1 and +1.
10. Biostatistics-134
If the rrank (Rs) value is 0, null hypothesis is accepted.
Otherwise, it is rejected.
The rank correlation method can be used when
1. The values of the variables are available in rank order form.
2. The data are qualitative in nature and can be ranked in some
order.
3. The data were originally quantitative in nature but because of
smallness of sample size were converted into ranks.
What are the types of correlation coefficient? Discuss with
figure. (BSMMU, MD Radiology, January, 2009)
Pearson’s correlation coefficient or spearman's rank
correlation coefficient
When associated variables are normally distributed such as height
and weight, the Pearson’s correlation coefficient is used. When two
variables are correlated, but not normally distributed spearman's
formula for rank correlation coefficient is used.
When calculating a correlation coefficient for ordinal data, select
Spearman's technique. For interval or ratio-type data, use
Pearson's technique.
REGRESSION
In experimental sciences after having understood the correlation
between two variables, there are situations when it is necessary to
estimate or predict the value of one character (variable say Y) from
the knowledge of the other character (variable say X) such as to
estimate height when weight is known. This is possible when the
two are linearly correlated. The former variable (Y i.e., weight) to be
estimated is called dependent variable and the latter (X i.e., height)
which is known, is called the Independent variable. This is done by
finding another constant called regression coefficient (b).
People use regression on an intuitive level every day. In business, a
well-dressed man is thought to be financially successful. A mother
knows that more sugar in her children's diet results in higher
energy levels. The ease of waking up in the morning often
11. Biostatistics-135
depends on how late you went to bed the night before.
Quantitative regression adds precision by developing a
mathematical formula that can be used for predictive purposes.
For example, a medical researcher might want to use body weight
(independent variable) to predict the most appropriate dose for a
new drug (dependent variable).
Regression means change in the measurements of a variable
character, on the positive or negative side, beyond the mean.
Regression coefficient is a measure of the change in one
dependent (Y) character with one unit change in the independent
character (X). It is denoted by letter ‘b’ which indicates the relative
change (Yc) in one variable (Y) from the mean (Y ) for one unit of
move, deviation or change (x) in another variable (X) from the
mean ( X ) when both are correlated. This helps to calculate or
predict any expected value of Y, i.e., Y corresponding to X. When
corresponding values Yc1. Yc2………….. Ycn are plotted on a graph a
straight line called the regression line or the mean correlation line
(Y on X) is obtained. The same was referred to as an imaginary line
while explaining various types of correlation.
The regression technique is primarily used to
1. Estimate the relationship that exists, on the average, between
the dependent variable and the explanatory (independent)
variable.
2. Determine the effect of each of the explanatory variables on
the dependent variables, controlling the effect of all the
explanatory variables.
3. Predict the value of the dependent variable for a given value
of the explanatory variable.
Types
Three types of regression models are fundamental to
epidemiological research:
1. linear regression
2. logistic regression
12. Biostatistics-136
3. Cox proportional hazards regression, a type of survival
analysis.
Linear regression: Here the dependent variable is a continuous
measure (such as body weight) with its frequency distribution
being the normal distribution. and the independent variables may
be both continuous and categorical.
Logistic regression: the dependent variable is derived from the
presence or absence of a characteristic,
Cox proportional hazards: the dependent variable represents the
time from a baseline of some type to the occurrence of an event of
interest.
[Reference: Bonita R, Beaglehole R, Kjellström T 2006. Basic
epidemiology, 2nd
edition, WHO.]
Difference between correlation and regression analysis
There are two important points of differences between correlation
and regression analysis.
1. Whereas correlation coefficient is a measure of degree of
relationship between x and y, the objective of regression
analysis is to study the nature of relationship between the
variables.
2. The cause and effect relation is clearly indicated through
regression analysis than by correlation. Correlation is merely a
tool of ascertaining the degree of relationship between two
variables and, therefore, we can not say that one variable is the
cause and the other the effect.
Scatter diagram
The graphical representation of bivariate data is called scatter
diagram. The graph of the data obtained by the values of the
variables x and y along the x-axis and y-axis respectively in the x-y
plane gives the scatter diagram.
13. Biostatistics-137
From the scatter diagram it can be evidently ascertained whether
there is any correlation existing among the variable or not. if there
exits correlation, types of correlation can also be ascertained.
Utilities of scatter diagram
1. It is simple and non mathematical method of studying
correlation between the variables. As such it can be easily
understood.
2. It is not influenced by the size of the extreme values whereas
most of the mathematical methods of finding correlation are
influenced by extreme values.
3. Making a scatter diagram usually is the first step in
investigating the relationship between the variables.