Biostatistics /certified fixed orthodontic courses by Indian dental academy Indian dental academy
The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide range of dental certified courses in different formats.
Indian dental academy provides dental crown & Bridge,rotary endodontics,fixed orthodontics,
Dental implants courses.for details pls visit www.indiandentalacademy.com ,or call
0091-9248678078
Biostatistics /certified fixed orthodontic courses by Indian dental academy Indian dental academy
The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide range of dental certified courses in different formats.
Indian dental academy provides dental crown & Bridge,rotary endodontics,fixed orthodontics,
Dental implants courses.for details pls visit www.indiandentalacademy.com ,or call
0091-9248678078
Lecture of Respected Sir Dr. L.M. BEHERA from N.I.H. KOLKATA in a workshop at G.D.M.H.M.C. - Patna in the Year 2011.
SUBJECT : BIOSTATISTICS
TOPIC : 'INTRODUCTION TO BIOSTATISTICS'.
Indian Dental Academy: will be one of the most relevant and exciting training center with best faculty and flexible training programs for dental professionals who wish to advance in their dental practice,Offers certified courses in Dental implants,Orthodontics,Endodontics,Cosmetic Dentistry, Prosthetic Dentistry, Periodontics and General Dentistry.
Lecture of Respected Sir Dr. L.M. BEHERA from N.I.H. KOLKATA in a workshop at G.D.M.H.M.C. - Patna in the Year 2011.
SUBJECT : BIOSTATISTICS
TOPIC : 'INTRODUCTION TO BIOSTATISTICS'.
Indian Dental Academy: will be one of the most relevant and exciting training center with best faculty and flexible training programs for dental professionals who wish to advance in their dental practice,Offers certified courses in Dental implants,Orthodontics,Endodontics,Cosmetic Dentistry, Prosthetic Dentistry, Periodontics and General Dentistry.
Electroconvulsive therapy and its present statusSubrata Naskar
Electroconvulsive therapy and its present status.
A Short seminar on the indications, process of Electroconvulsive therapy and its current status in society as a form of treatment.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
3. Why Statistics?
• Evidence-based practice!
• Research provides evidence for changes
in nursing/medical practice
– Away from “that’s the way it has always been
done”
4. Integral to Research
• Question (hypothesis)
• Design
• Data collection
• Analysis
• Answer to question
– And often more questions asked!
6. Two Types of Data
• Qualitative
– Non-numeric or narrative information
• Example: transcripts of interviews
• Maybe “scored” to be made quantitative
• Quantitative
– Numeric or quantifiable information
• Example: weights of kindergartners
7. Variable
• A quantity capable of assuming a set of
values
• A characteristic or attribute of a person,
object, etc that varies within a population
under study
• Examples:
– Body temperature, BP, DOB, ABG, weight
8. Independent and Dependent
• Independent
– The variable assumed to influence the
outcome
• It is independent of the outcome
– In research, the manipulated variable
• Dependent
– The outcome variable of interest
– In research, value assumed to be dependent
on the independent variable (by hypothesis)
9. Independent and Dependent
• Examples:
– What is the effect of smoking on the incidence
of lung cancer?
– Does high fiber diet reduce the risk of colon
cancer?
– Does AZT help prevent maternal transmission
of HIV?
10. Discrete vs Continuous
• Discrete variable: has a finite number of
values between two points
• Continuous variable: has, in theory, an
infinite number of values between two
points
11. Discrete vs Continuous
• Examples:
– Number of children
– Body temperature
– Hospital readmissions
– Chemotherapy sessions
– Body weight
– DOB
12. Measurement
• The assignment of numbers to objects
according to specified rules to
characterize quantities of some attribute
13. Measurement Rules
• Common/familiar/accepted
– Temperature, weight, height
• Researcher designed
– Particularly for new materials/ideas
• Coding
– The process of transforming raw data into
standardized form for processing and analysis
14. Advantages of Measurement
• Objectivity
– Objective measure can be independently
verified by other researchers
• Precision
– Quantitative measures allow for reasonable
precision
• Communication
– Facilitates communication of data and
research
16. Nominal Measurement/Variable
• Nominal = Named
• Lowest level
• Assignment of characteristics into
categories
– Simply putting into boxes with no meaning of
where the boxes fall in a line
• Examples
– Gender, marital status
17. Ordinal Measurement/Variable
• Ordinal=Order
• Next in the hierarchy of measurement
• Involves rank order of variable along some
dimension
• Examples
– School grades
– Clinical nursing levels
18. Interval Measurement/Variable
• Interval=equal distances
• Attribute is rank-ordered on a scale that
has equal distances between points on
that scale
• Examples
– Temperature
19. Ratio Measurement/Variables
• Equal distances between score units and
which has a true, meaningful zero point
– A true ratio can be calculated
• The highest level of measurement
• Examples
– Weight
– Pulse
20. Why care about type of
measurement/variable?
• Statistical tests are/have been developed
to work and provide meaningful analysis
for specific types of measurement and
variable
• The tests you choose to run should be
based, in part, on the type of variables
with which you work
21. Which measurement?
• A single variable may be measurable
using different types of measurement
• Rule of Thumb: use the highest level of
measurement possible
– Higher levels provide more information
– Higher levels can be analyzed with more
powerful statistical tools
22. Data Analysis
• Data starts out “raw”
– unanalyzed
• Processing
– Coding, if appropriate
– Data entry
• Into database or matrix
– Cleaning
• Finding and correcting (if possible) errors in entry
and coding
– Analysis
23. Sample vs Population
• Sample
– A subset of a population
– Ideally selected to be representative of the
population
• Population
– The entire set of individuals (objects, units,
etc) having common characteristics
24. Two Types of Statistics
• Descriptive
– Used to describe and summarize data set
– Allows us to describe, compare, determine a
relationship
– Usually straightforward - %, averages, etc
• Inferential
– Permit us to infer whether a relationship
observed in a sample is likely to occur in the
population of concern
– Are relationships “real”?
25. Uses of Inferential Statistics
• Draw conclusions about a single variable
in a population
• Evaluate relationships between variables
in populations
• Are the relationships “real”?
26. Inferential Stats: Relationships
• Existence
– Is there a relationship between X and Y?
• Magnitude
– How strong is the relationship between X and
Y?
• Nature
– What type of relationship is there between X
and Y?
27. Number of variables…
• “Univariate”
– One variable being described
• “Bivariate”
– Two variables being compared
• NOTE: in epidemiology, this is also known as
“univariate”
• Mulitvariate
– More than two variables being compared
• Different statistical tests for each
28. Purposes of Data Analysis
• In research all usually get done to some
extent
– Clean data
– Sample description
– Assessment of bias
– Evaluation of tools used to collect data
– Evaluation of need for data transformations
– Address the research question
29. Describing the Data Set
• Organize the data
• Examine the patterns of distribution
• Describe patterns of distribution
• Asses the variability of the data
30. Simplest Distribution:
The Frequency Distribution
• Lists categories of scores or values as
well as counts of the number of each
score or value
– List and tally
– By computer
• Enter data
• Run “frequency”
31. Two Kinds of Frequency
• Absolute
– Number of times a score occurs
– Symbol: f
• Relative
– Proportion of times a score occurs
– Most commonly percent
• % = (f/N) X 100
– f=frequency, N=sum of all frequencies
36. Grouped Frequency Distribution
• Values are grouped into intervals
– Class intervals are all the same size
– Class intervals are mutually exclusive
– Useful when data is dispersed
• Or there are restrictions on “small cell size”
– For example: HIV/AIDS reporting
– Loss of information with grouping
• Anytime one moves from the individual level to
group level
37. Grouped Frequency Distribution
Interval f
<150mm Hg 4
150-158mm Hg 10
160-168mm Hg 24
170-178mm Hg 15
180-188mm Hg 6
≥188mm Hg 1 n=60
38. Displaying Data
• Tables
• Bar graphs
• Pie charts
• Histograms
• Frequency Polygons
– aka Line charts/graphs
39. Bar Graph
• Used primarily for nominal and ordinal
data
• Values across the X axis
• Frequencies along the Y axis
40. Bar Graph of Hypertension Data
10
9
8
7
subjects
6
5
of number 4
3
2
1
0
mm Hg 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
(Generated in Excel)
41. Histogram
• Like a bar graph
• Used for continuous (interval or ratio) data
– Rarely seen even for interval or ratio data
– Not offered as an option in Excel
• Bars touch
• May use grouped data
43. Frequency Polygon
(aka Line Graph)
• Used for interval and ratio data
• X and Y axes the same as for bar charts
• Marker placed at intersection of the value
and frequency for a series of values
• Markers then connected with a line
44. Frequency Polygon of
Hypertension Data
10
9
8
7
6
5
4
3
2
1
0
146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
mm Hg
number of subjects
45. Effective Graphical Display
• Should accurately represent data
• Should be easily understood
– Not too busy or complicated
• Should stand alone
– Ideal and rare
46. Distribution Shapes – 5 Basics
• Modality
• Symmetry and Skewness
• Kurtosis
• Central Tendency
• Variability
47. Modality – Basic Shape
• Peaks or high points
in the data
• May have one or
multiple peaks
– Unimodal = 1 peak
– Bimodal = 2 peaks
– Multimodal = multiple
peaks
48. Symmetry
• Symmetrical
– If you draw a line through the center it
produces mirror images
– In real life: approximately the same
distribution on either side of the center line
• Asymmetrical
– Distribution is lopsided or skewed
49. Asymmetric Distribution: Skewness
• Affected by outliers
• Positive
– The “tail” points to the
right (positive
direction)
• Negative
– The “tail” points to the
left (negative direction)
50. Kurtosis
• Assumes symmetric
distribution
• Refers to how pointy the
peak of the distribution is
– How concentrated in the
middle of the distribution
• Platykurtic
– Low, flattened peak
• Leptokurtotic
– High narrow peak
51. The Normal Distribution
• Unimodal
• Symetrical
• Peak is neither high nor flat
• “Bell-shaped curve”
• The ideal distribution
– And therefore “normal”
52. Looking at Frequency Distributions
• Learn about the data set
• Clean the data
• Identify missing values
• Test assumptions
– About the distribution
• Answer research questions
– About the distribution
53. Quartiles
• Calculated by dividing data into quarters
– The median is the 2nd quartile
• Quartile 1 is the point at which 25% of
values are below and 75% of values are
above
• Quartile 3 is the point at which 75% of
values are below and 25% of values are
above
54. Part 2
Describing and Displaying Data
Measures of Central Tendency
Univariate Statistics
55. Measures of Central Tendency
• Tells you about the area of the distribution
where the bulk of values fall
• Measures include:
– Mean
– Median
– Mode
56. Mode
• The value that occurs most often
• Limitations
– Data may be multimodal
– Mode can vary from one sample to another in
the same population
• Considered unstable
57. Median (Mdn)
• Point that divides the distribution in half
• Corresponds to the 50th percentile
• 50% will be below the median and 50%
above it
• If the number of scores is odd
– Median is the number exactly in the middle
• If the number of scores is even
– Median is the average of the 2 middle numbers
58. Median (Mdn)
• Measures the location of the middle of
the distribution
• Not sensitive to actual numerical values
– Not affected by outliers
59. Mean
• Most common measure of central tendency
• Most stable, provides the most accurate
estimate
– Assuming a normal distribution
• Calculated by adding all values and dividing
by the number of cases
– aka average
– Best understood by the general public
60. Mean
• Affected by each value in the distribution
• Intended for interval or ratio data
– In some designs can be used for ordinal
• The sum of the deviation scores from the
mean always equals 0
• Abbreviated x for samples
– X for Population
61. Mean, Median, and Mode
• Mean is preferred in a normal distribution
– Extreme scores or outliers can result in a
mean that doesn’t reflect central tendency
• Skewed data
• With skewed data, or extreme outliers use
median
– Example: Median home price
62. x, Mdn, Mode – Hypertension
Study
x f x f x f
146 1 162 4 178 2
148 2 164 5 180 2
150 2 166 9 182 2
152 2 168 5 184 1
154 2 170 4 186 2
156 2 172 3 188 1
158 2 174 2 190 1
160 3 176 2 192 0
Sum of values (x) = Σ(x) = 9,989
Number of cases = n = 60
Mode = 166
Mdn = 166
Mean = 9991/60
= 166.5
63. Quickly Assessing Distribution
• If the mean, median, and mode are similar
– Approximately normally distributed
• If the median>mean
– Negatively skewed
• If the median<mean
– Positively skewed
• The mean is pulled in the direction of the
skew
65. Variability
• Refers to how spread out the scores are in
a distribution
• Two distributions with the same mean can
differ greatly in variability
– Homogeneous: values are similar
– Heterogeneous: values with more variability
• Measures:
– Range/Semiquartile Range
– Variance
– Standard Deviation
67. Range
• Simplest of the measures of variability
• Difference between the lowest and highest
values in a distribution
– 190-146 = 44
• Sometimes reported as a minimum and
maximum value
– Range 146-190
68. Range
• Limitations
– Based on only 2 values, highest and lowest
• Can be unstable when multiple samples are taken
from the same population
• Doesn’t tell you anything about what is happening
in the middle
– As the sample size increases, range is likely
to increase
• Greater chance of outlier
69. Standard Deviation -
s, SD, Std Dev
• A measure of how far values vary from the
mean of a given sample
– Tells you the average deviation
– How much the scores deviate from the mean
• Most widely used measure of variability
• Takes into consideration every score in
the distribution
71. Standard Deviation
• Taking the square root returns the value of
the standard deviation to the original scale
• The lower the standard deviation, the
better measure the mean is as a summary
of the data
– The less variability there is among the scores
72. Variance
• Simply s2
• The standard deviation calculation before
the square root is taken and is equal to:
Σ (X-mean)2 / N-1
73. Standard Deviation - uses
• Useful when looking at a single score in
relation to a distribution
• Normal Distribution
– There are about 3 SD above and below the
mean
– A fixed percent of scores lie within each SD:
• 68% within 1 SD above and below the mean
• 95% within 2 SD above and below the mean
• 99.7% within 3 SD above and below the mean
74. Normal Distribution with SD = 15
and mean = 100
2.5% 13.5% 34% 34% 13.5% 2.5%
70 85 100 115 130
68%
95%
75. Standard Scores
• Scores that represent relative distance
from mean. Measures of position.
Z= X-X/SD
• Raw score minus mean divided by SD:
gives score in SD units
• Z score is # of SDs a given value of ‘X’ is
away from mean. Z score of 1 is 1 SD
above mean.
• Z Distribution has mean = 0 and SD = 1
76. Standard Scores – Z scores
• Allow for the standardization (in SD units)
of values in a distribution relative to the
mean
• Standard Score Z = (x-x)/SD
• Number of SD a given value of x is from
the mean
– Z score of 1 is 1 SD above the mean
• Z distribution has mean=0 and SD=1
77. 55 60 65 85 100 115
Z Score = Z Score =
-1 0 +1
Score - Mean
---------------------
SD
Z Distribution
Score - Mean
---------------------
SD
78. Normal Z Distribution (Mean = 0, SD = 1)
2.5% 13.5% 34% 34% 13.5% 2.5%
-2 -1 0 +1 +2
68%
95%
79. Normal Distribution/Z Scores
• The entire percent under the curve is
100%
– Probability of being somewhere under the
curve is 100%
• Most values will lie in the middle
• Out at the ends we become less sure
– Is a value out at 1% really representative?
80. Using Normal Distributions/
Z Scores
• Transformation
– The z score can be transformed to reset the
mean and SD
• Transformed Z = 10(Z) + 50
– Now mean = 50 and SD=10
• P-value
– Likelihood of a given value falling at a
particular point on the curve
– We will come back to this
81. Using Normal Distributions/
Z Scores
• Z score can tell you the probability of a
value falling into a given area of the curve
– Get z score
– Match to %
• Z= 2 corresponds to 95%
– Gives the probability of the value being the
true mean
– Z-score tables
82. Parameters vs Statistics
• Parameters
– Computed for populations
– Greek symbols used
• μ =mean, σ = std dev
• Statistics
– Computed for samples
– English symbols used
• X =mean, s = std dev
83. Computers and
Measures of Central Tendency
• Statistical software is in widespread use
but…
• The operator (you) must be aware of
levels of measurement etc
– The computer doesn’t know
– Have to choose the right method for type of
data
85. “Bivariate” Statistics
• Used to describe the relationship between
2 variables (bi-variate)
– 2 nominal variables
– 1 nominal, 1 ratio/interval
– 2 ratio/interval
86. Crosstabulation
• Results in a contingency table
– 2 dimensional frequency distribution
• The simplest: 2X2
– 2 nominal or ordinal variables
• One heading columns
• One heading rows
87. Crosstabulation - Example
High
School
College total
<$30,000 64 19 83
≥$30,000 36 81 117
total 100 100 200
88. Comparison of Group Means
• Nominal & Interval or Ratio Variable
• IV: nominal or ordinal
– Sex, ethnicity, age group etc
• DV: interval or ratio
– Heart rate, BP, weight etc
• Means and SD calculated for each category
of the IV
• NOTE: NO INFERENCE is made about
significance of difference between categories
89. Comparison of Group Means
education n mean SD Min Max
High School 100 24,657 2,598 10,103 75,362
College 100 36,431 7,912 15,256 126,754
total 200 31,989 6,110 10,103 126,754
90. Correlation
• A linear relationship between 2 variables
– Interval or ratio variables
• Can be plotted and displayed graphically
– Scatter plot
• Can be calculated statistically
– Correlation coefficient
– r
91. Scatterplot
• Values for one variable on X axis
• Values for the other variable on Y axis
• Data plotted for each subject/case
• Examine the plot for pattern
– Data arrayed closely together indicates strong
correlation
92. Positive Correlation
• As one variable
increases in value, so
does the other
• On the plot:
– Diagonal line upwards
and to the right
• Example:
– Age and BP
93. Negative Correlation
• As one value
increases the other
decreases
• On the plot:
– Diagonal line down
and to the right
• Example:
– Age and bone density
94. Scattered Scatterplots
• Indicate little or no
relationship between
variables
• Can be dispersed or
concentrated
95. Non-linear Scatterplots
• There is a relationship
but…
• Some relationships
are not linear….
• May be curved
– S
– U
– Up then flat
– others
97. Correlation Coefficient - r
• Statistical measure used to
– Determine if a relationship exists between two
variables
– Test a hypothesis about that relationship
• Allows us to make a mathematical
statement about the relationship
– Do the variables vary together?
• AKA Pearson Correlation Coefficient
98. Correlation Coefficient -
Assumptions
• Sample must accurately representative
• The distributions must be approximately
normal
• Each value of X must have a
corresponding value of Y
– If many have X value but not Y value, analysis
will be strongly biases
99. Correlation
r = n(Σxy) – (Σx)(Σy)
[n(Σx2) - (Σx)2 ] [n(Σy2) - (Σy)2 ]
= cov(X,Y)
var(X) x var(Y)
101. Correlation Coefficient
• Range
– -1 to 1
• Positive correlation
– 0 to1
• Negative
correlation
– -1 to 0
• The closer to each
of these the
stronger the
correlation
– -0.9: strong
negative
– -0.2: weak negative
– 0: none
– 0.2: weak positive
– 0.9 strong positive
102. Correlation Coefficient –
Significance
• Depends on number of pairs
• Varies for each r
– r of 0.3 may be significant for n=1500 but not
for n=40
• Also depends on variance (SD)
– Greater the variance, less significance
• Generally:
– 0.60 or –0.60 is strong for medical variables
• Manufacturing requires 0.90 or greater
103. The Scatterplot and r
• ALWAYS look at the
scatterplot along with
r
• Each of these plots
has r=0.70
104. Correlation
• The square of the correlation coefficient,
R2, indicates the variability in one variable
that can be explained by the other
– Example: age and BP
• R2 = 0.49 (r=0.70)
• 49% of the variation in BP is explained by age
– aka Coefficient of Determination
• Correlation does NOT imply causation