how to determine your sample size using Slovin's formula.
please click subscribe to get notifications when new materials are uploaded.
also kindly hit the like and share button so others may easily find this material.
thanks.
how to determine your sample size using Slovin's formula.
please click subscribe to get notifications when new materials are uploaded.
also kindly hit the like and share button so others may easily find this material.
thanks.
As we have learned in the previous lesson, Statistics is a science that studies data. Hence to teach Statistics, real data set is recommend to use. In this lesson, we present an activity where the students will be asked to provide some data that will be submitted for consolidation by the teacher for future lessons. Data on heights and weights, for instance, will be used for calculating Body Mass Index in the integrative lesson. Students will also be given the perspective that the data they provided is part of a bigger group of data as the same data will be asked from much larger groups (the entire class, all Grade 11 students in school, all Grade 11 students in the district). The contextualization of data will also be discussed.
As continuation of Lesson 2 (where we contextualize data) in this lesson we define basic terms in statistics as we continue to explore data. These basic terms include the universe, variable, population and sample. In detail we will discuss other concepts in relation to a variable.
As we have learned in the previous lesson, Statistics is a science that studies data. Hence to teach Statistics, real data set is recommend to use. In this lesson, we present an activity where the students will be asked to provide some data that will be submitted for consolidation by the teacher for future lessons. Data on heights and weights, for instance, will be used for calculating Body Mass Index in the integrative lesson. Students will also be given the perspective that the data they provided is part of a bigger group of data as the same data will be asked from much larger groups (the entire class, all Grade 11 students in school, all Grade 11 students in the district). The contextualization of data will also be discussed.
As continuation of Lesson 2 (where we contextualize data) in this lesson we define basic terms in statistics as we continue to explore data. These basic terms include the universe, variable, population and sample. In detail we will discuss other concepts in relation to a variable.
Statistical Methods, Likert-type items, Reliability vs. validity, Modeling vs. description,Consider theory, Raw Agreement Indices, Nonparametric tests, Factor analysis and SEM, Measurement Model, Odds Ratio and Yule's Q, Agreement on Interval-Level Ratings, Distribution of Ratings, Using the Results.
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
Biostatistics - the application of statistical methods in the life sciences including medicine, pharmacy, and agriculture.
An understanding is needed in practice issues requiring sound decisions.
Statistics is a decision science.
Biostatistics therefore deals with data.
Biostatistics is the science of obtaining, analyzing and interpreting data in order to understand and improve human health.
Applications of Biostatistics
Design and analysis of clinical trials
Quality control of pharmaceuticals
Pharmacy practice research
Public health, including epidemiology
Genomics and population genetics
Ecology
Biological sequence analysis
Bioinformatics etc.
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docxcullenrjzsme
ANALYSIS AND
INTERPRETATION
OF DATA
Analysis and Interpretation of Data
https://my.visme.co/render/1454658672/www.erau.edu
Slide 1 Transcript
In a qualitative design, the information gathered and studied often is nominal or narrative in form. Finding trends, patterns, and relationships is discovered inductively and upon
reflection. Some describe this as an intuitive process. In Module 4, qualitative research designs were explained along with the process of how information gained shape the inquiry as it
progresses. For the most part, qualitative designs do not use numerical data, unless a mixed approach is adopted. So, in this module the focus is on how numerical data collected in either
a qualitative mixed design or a quantitative research design are evaluated. In quantitative studies, typically there is a hypothesis or particular research question. Measures used to assess
the value of the hypothesis involve numerical data, usually organized in sets and analyzed using various statistical approaches. Which statistical applications are appropriate for the data of
interest will be the focus for this module.
Data and Statistics
Match the data with an
appropriate statistic
Approaches based on data
characteristics
Collected for single or multiple
groups
Involve continuous or discrete
variables
Data are nominal, ordinal,
interval, or ratio
Normal or non-normal distribution
Statistics serve two
functions
Descriptive: Describe what
data look like
Inferential: Use samples
to estimate population
characteristics
Slide 3 Transcript
There are, of course, far too many statistical concepts to consider than time allows for us here. So, we will limit ourselves to just a few basic ones and a brief overview of the more
common applications in use. It is vitally important to select the proper statistical tool for analysis, otherwise, interpretation of the data is incomplete or inaccurate. Since different
statistics are suitable for different kinds of data, we can begin sorting out which approach to use by considering four characteristics:
1. Have data been collected for a single group or multiple groups
2. Do the data involve continuous or discrete variables
3. Are the data nominal, ordinal, interval, or ratio, and
4. Do the data represent a normal or non-normal distribution.
We will address each of these approaches in the slides that follow. Statistics can serve two main functions – one is to describe what the data look like, which is called descriptive statistics.
The other is known as inferential statistics which typically uses a small sample to estimate characteristics of the larger population. Let’s begin with descriptive statistics and the measures
of central tendency.
Descriptive Statistics and Central Measures
Descriptive statistics
organize and present data
Mode
The number occurring most
frequently; nominal data
Quickest or rough estimate
Most typical value
Measures of central
tendenc.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Advanced statistics
1. Prof. JOY V. LORIN-PICAR
DAVAO DEL NORTE STATE COLLEGE
NEW VISAYAS, PANABO CITY
2. TOPIC OUTLINE
PART 1
Role of Statistics in Research
Descriptive Statistics
Hands –On Statistical Software
Sample and Population
Sampling Procedures
Sample Size
Hands –On Statistical Software
Inferential Statistics
Hypothesis Testing
Hands –On Statistical Software
3. TOPIC OUTLINE
PART 2
Choice of Statistical Tests
Defining Independent and Dependent
Variables
Hands –On Statistical Software
Scales of Measurements
How many Samples / Groups are in the Design
PART 3
Parametric Tests
Hands –On Statistical Software
PART 4
Non-Parametric Tests
Hands –On Statistical Software
4. TOPIC OUTLINE
PART 5
Goodness of Fit
Hands –On Statistical Software
PART 6
Choosing the Correct Statistical Tests
Hands –On Statistical Software
Introduction to Multiple and Non-Linear
Regression
Hands –On Statistical Software
5. Role of Statistics in Research
Normally use to analyze data
To organize and make sense out of large amount
of data
This is basic to intelligent reading research
article
Has significant contributions in social sciences,
applied sciences and even business and
economics
Statistical researches make inferences about
population characteristics on the basis of one or
more samples that have been studied.
6. How is Statistics look into ?
1. Descriptive – this gives us information ,
or simple describe the sample we are
studying.
2. Correlational - this enables us to relate
variables and establish relationship
between and among variables which are
useful in making predictions.
3. Inferential – this is going beyond the
sample and make inference on the
population.
7. Descriptive Statistics
N - total population/sample size from any given
population
Example
Minutes Spent on the Phone
102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105 97 107 67 78 125
109 99 105 99 101 92
9. Range, Mean, Median and Mode
The terms mean, median, mode, and range describe
properties of statistical distributions. In statistics, a distribution
is the set of all possible values for terms that represent defined
events. The value of a term, when expressed as a variable, is
called a random variable. There are two major types of
statistical distributions. The first type has a discrete random
variable. This means that every term has a precise, isolated
numerical value. An example of a distribution with a discrete
random variable is the set of results for a test taken by a class in
school. The second major type of distribution has a continuous
random variable. In this situation, a term can acquire any
value within an unbroken interval or span. Such a distribution
is called a probability density function. This is the sort of
function that might, for example, be used by a computer in an
attempt to forecast the path of a weather system.
10. Mean
The most common expression for the mean of a statistical
distribution with a discrete random variable is the
mathematical average of all the terms. To calculate it,
add up the values of all the terms and then divide by the
number of terms. This expression is also called the
arithmetic mean. There are other expressions for the
mean of a finite set of terms but these forms are rarely
used in statistics. The mean of a statistical distribution
with a continuous random variable, also called the
expected value, is obtained by integrating the product of
the variable with its probability as defined by the
distribution. The expected value is denoted by the
lowercase Greek letter mu (µ).
11. Median
The median of a distribution with a discrete random variable
depends on whether the number of terms in the distribution is
even or odd. If the number of terms is odd, then the median is
the value of the term in the middle. This is the value such that
the number of terms having values greater than or equal to it is
the same as the number of terms having values less than or
equal to it. If the number of terms is even, then the median is
the average of the two terms in the middle, such that the
number of terms having values greater than or equal to it is the
same as the number of terms having values less than or equal to
it. The median of a distribution with a continuous random
variable is the value m such that the probability is at least
1/2 (50%) that a randomly chosen point on the function
will be less than or equal to m, and the probability is at
least 1/2 that a randomly chosen point on the function will
be greater than or equal to m.
12. Mode
The mode of a distribution with a discrete random
variable is the value of the term that occurs the most
often. It is not uncommon for a distribution with a
discrete random variable to have more than one mode,
especially if there are not many terms. This happens when
two or more terms occur with equal frequency, and more
often than any of the others. A distribution with two
modes is called bimodal. A distribution with three modes
is called trimodal. The mode of a distribution with a
continuous random variable is the maximum value of
the function. As with discrete distributions, there may be
more than one mode.
13. RangeThe range of a distribution with a discrete
random variable is the difference between the
maximum value and the minimum value. For a
distribution with a continuous random variable,
the range is the difference between the two
extreme points on the distribution curve,
where the value of the function falls to zero.
For any value outside the range of a distribution,
the value of the function is equal to 0.
The least reliable of the measure and is use
only when one is in a hurry to get a measure
of variability
16. Standard Deviation
The standard deviation formula is very simple: it
is the square root of the variance. It is the most
commonly used measure of spread.
An important attribute of the standard deviation
as a measure of spread is that if the mean and
standard deviation of a normal distribution
are known, it is possible to compute the
percentile rank associated with any given score.
17. Standard Deviation
In a normal distribution, about 68% of the
scores are within one standard deviation of the
mean and about 95% of the scores are within
two standard deviations of the mean.
The standard deviation has proven to be an
extremely useful measure of spread in part
because it is mathematically tractable. Many
formulas in inferential statistics use the
standard deviation.
23. KURTOSIS - refers to how sharply peaked
a distribution is. A value for kurtosis is included
with the graphical summary:
· Values close to 0 indicate normally peaked
data.
· Negative values indicate a distribution that is
flatter than normal.
· Positive values indicate a distribution with a
sharper than normal peak.
26. Samples and Population
Population – as used in research, refers to all
the members of a particular group.
It is the group of interest to the researcher
This is the group of whom the researcher
would like to generalize the results of a
study
27. A target population is the actual population to
whom the researcher would like to generalize
Accessible population is the population to whom
the researcher is entitled to generalize
28. SAMPLING
This is the process of selecting the individuals
who will participate in a research study.
Any part of the population of individuals of whom
information is obtained.
A representative sample is a sample that is similar to
the population to whom the researcher is entitled
to generalize
29. PROBABILITY AND NON-PROBABILITY
SAMPLING
A sampling procedure that gives every element of
the population a (known) nonzero chance of
being selected in the sample is called probability
sampling. Otherwise, the sampling procedure is
called non-probability sampling.
Whenever possible, probability sampling is
used because there is no objective way of
assessing the reliability of inferences under
non-zero probability sampling.
30. METHODS OF PROBABILITY
SAMPLING
1. simple random sampling
2.systematic sampling
3.stratified sampling
4. cluster sampling
5. two-stage random sampling
31. Simple Random Sampling
This is a sample selected from
a population in such a manner
that all members of the
population have an equal
chance of being selected
32. Stratified Random Sampling
Sample selected so that certain
characteristics are represented in
the sample in the same proportion
as they occur in the population
38. Purposive Sampling
Consist of individuals who
have special qualifications of
some sort or are deemed
representative on the basis of
prior evidence
39. Quota Sampling
In quota sampling, the population is first
segmented into mutually exclusive sub-groups,
just as in stratified sampling. Then judgment is
used to select the subjects or units from each
segment based on a specified proportion. For
example, an interviewer may be told to sample
200 females and 300 males between the age of
45 and 60. This means that individuals can put
a demand on who they want to sample
(targeting)
40. Snow ball Sampling
snowball sampling is a technique for developing a
research sample where existing study subjects recruit
future subjects from among their acquaintances. Thus
the sample group appears to grow like a rolling
snowball. As the sample builds up, enough data is
gathered to be useful for research. This sampling
technique is often used in hidden populations which
are difficult for researchers to access; example
populations would be drug users or prostitutes. As
sample members are not selected from a sampling
frame, snowball samples are subject to numerous
biases
41. General Classification of
Collecting Data
1. Census or complete enumeration-is the
process of gathering information from every unit
in the population.
- not always possible to get timely, accurate and
economical data
- costly, if the number of units in the population is
too large
2. Survey sampling- is the process of obtaining
information from the units in the selected sample.
Advantages: reduced cost, greater speed, greater
scope, and greater accuracy
42. Sample size
Samples should be as large as a researcher can
obtain with a reasonable expenditure of time and
energy.
As suggested, a minimum number of subjects is 100
for a descriptive study , 50 for a correlational study,
and 30 in each group for experimental and causal-
comparative design
According to Padua , for n parameters, minimum n
could be computed as n >= (p +3) p/2 where p =
parameters , say if p = 4, thus minimum n = 14.
43. Inferential Statistics
This is a formalized techniques used to make
conclusions about populations based on samples
taken from the populations.
44. Hypothesis
Hypothesis is defined as the tentative theory or
supposition provisionally adopted to explain certain facts
and to guide in the investigation of others.
A statistical hypothesis is an assertion or statement that
may or may not be true concerning one or more
population.
Example:
1. A leading drug in the treatment of hypertension has an
advertised therapeutic success rate of 83%. A medical
researcher believes he has found a new drug for treating
hypertensive patients that has higher therapeutic success
rate than the leading than the leading drug with fewer side
effect.
45. The Statistical Hypothesis :
HO: The new drug is no better than the old one (p
=0.83)
H1: The new drug is better than the old one ( p> 0.83)
Example 2. A social researcher is conducting a study
to determine if the level of women’s participation in
community extension programs of the barangay can
be affected by their educational attainment ,
occupation, income, civil status, and age.
46. HO: The level of women’s participation in community
extension programs is not affected by their
attainment, occupation, income , civil status and age.
H1: The level of women’s participation in community
extension programs is affected by their attainment,
occupation, income , civil status and age.
Example 3: A community organizer wants to compare
the three community organizing strategies applied to
cultural minorities in terms of effectiveness.
47. A. Hypothesis Testing
Steps in Hypothesis Testing
1. Formulate the null hypothesis and
the alternative hypothesis
- this is the statistical hypothesis
which are assumptions or guesses
about the population involved. In
short, these are statements about
the probability distributions of the
populations
48. Null Hypothesis
This is a hypothesis of “ no effect “.
It is usually formulated for the express
purpose of being rejected, that is, it is the
negation of the point one is trying to
make.
This is the hypothesis that two or more
variables are not related or that two or
more statistics are not significantly
different.
49. Alternative Hypothesis
This is the operational statement of
the researcher’s hypothesis
The hypothesis derived from the
theory of the investigator and
generally state a specified relationship
between two or more variables or that
two or more statistics significantly
differ.
50. Two Ways of Stating the
Alternative Hypothesis
1. Predictive - specifies the type of relationship
existing between two or more variables (direct or
indirect) or specifies the direction of the difference
between two or more statistics
2. Non- Predictive - does not specify the type of
relationship or the direction of the difference
51. C. LEVEL OF SIGNIFICANCE (α)
α is the maximum probability with which we
would be willing to risk Type I Error (The
hypothesis can be inappropriately rejected ).
The error of rejecting a null hypothesis when it
is actually true. Plainly speaking, it occurs
when we are observing a difference when in
truth there is none, thus indicating a test of
poor specificity. An example of this would be if
a test shows that a woman is pregnant when in
reality she is not.
52. In other words, the level of significance determines
the risk a researcher would be willing to take in his
test.
The choice of alpha is primarily dependent on the
practical application of the result of the study.
53. Examples of α
.05 (95 % confident of the claim)
.01 (99 % confident of the claim)
But take note, α is not always .05 or .01. This could
mathematically be computed based from the
formula :
where the variance , no of samples and its
difference are predetermined – Chebychev’s sample
size formula.
54. D. Defining a Region of Rejection
The region of rejection is a region of
the null sampling distribution. It
consists of a set of possible values which
are so extreme that when the null
hypothesis is true the probability is
small (i.e. equal to alpha) that the
sample we observe will yield a value
which is among them.
55. E. Collect the data and compute
the value of the test- statistic
F . Collect the data and compute the
value of the test –statistic.
G. State your decision.
H. State your conclusion.
56. B. Choose an Appropriate Statistical Test for
testing the Null Hypothesis
The choice of a statistical test for the analysis
of your data requires careful and deliberate
judgment.
PRIMARY CONSIDERATIONS:
The choice of a statistical test is dictated by
the questions for which the research is
designed
The level, the distribution , and dispersion of
data also suggest the type of statistical test to
be used
57. SECONDARY CONSIDERATIONS
The extent of your knowledge in
statistics
Availability of resources in
connection with the computation
and interpretation of data
58. Choice of Statistical Tests
This is designed to help you
develop a framework for choosing
the correct statistic to test your
hypothesis.
It begins with a set of questions
you should ask when selecting your
test.
It is followed by demonstrations of
the factors that are important to
consider when choosing your
statistic.
59. Choice of Statistical Tests
Presented below are four
questions you should ask and
answer when trying to determine
which statistical procedure is most
appropriate to test your
hypothesis.
60. Choice of Statistical Tests
What are the independent and
dependent variables?
What is the scale of measurement of
the study variables?
How many samples/groups are in
the design?
Have I met the assumptions of the
statistical test selected?
61. Choice of Statistical Tests
To determine which test should be
used in any given circumstance, we
need to consider the hypothesis that
is being tested, the independent and
dependent variables and their scale of
measurement, the study design, and
the assumptions of the test.
62. Variables
Before we can begin to choose our
statistical test, we must determine
which is the independent and which is
the dependent variable in our
hypothesis.
Our dependent variable is always the
phenomenon or behavior that we want
to explain or predict.
63. Defining Independent and Dependent
Variables
The independent variable represents a
predictor or causal variable in the
study.
In any antecedent-consequent
relationship, the antecedent is the
independent variable and the
consequent is the dependent variable.
64. Defining Independent and Dependent
Variables
With single samples and one dependent
variable, the one-sample Z test, the one-
sample t test, and the chi-square goodness-of-
fit test are the only statistics that can be used.
Students sometimes ask, "but don't you have
population data too, so you have two sets of
data?" Yes and no.
Data have to exist or else the population
parameters are defined. But, the researcher
does not collect these data, they already exist.
65. Defining Independent and Dependent
Variables
So, if you are collecting data on one sample
and comparing those data to information
that has already been gathered and is
published, then you are conducting a one-
sample test using the one sample/set of
data collected in this study.
For the chi-square goodness-of-fit test, you
can also compare the sample against chance
probabilities
66. Defining Independent and Dependent
Variables
When we have a single sample and
independent and dependent variables
measured on all subjects, we typically are
testing a hypothesis about the association
between two variables. The statistics that we
have learned to test hypotheses about
association include:
chi-square test of independence
Spearman's rs
Pearson's r
bivariate regression and multiple regression
67. Multiple Sample Tests
Studies that refer to repeated measurements or
pairs of subjects typically collect at least two sets
of scores. Studies that refer to specific subgroups
in the population also collect two or more samples
of data. Once you have determined that the
design uses two or more samples or "groups", then
you must determine how many samples or groups
are in the design. Studies that are limited to two
groups use either the chi-square statistic, Mann-
Whitney U, Wilcoxon test, independent means t
test, or the dependent means t test.
68. If you have three or more groups in the
design, the chi-square statistic, Kruskal-
Wallis H Test, Friedman ANOVA for ranks,
One-way Between-Groups ANOVA, and
Factorial ANOVA depending on the nature
of the relationship between groups. Some of
these tests are designed for dependent or
correlated samples/groups and some are
designed for samples/groups that are
completely independent.
69. Multiple Sample Tests
Dependent Means
Dependent groups refer to some type of
association or link in the research design
between sets of scores. This usually occurs
in one of three conditions -- repeated
measures, linked selection, or matching.
Repeated measures designs collect data on
subjects using the same measure on at least
two occasions. This often occurs before and
after a treatment or when the same research
subjects are exposed to two different
experimental conditions.
70. Multiple Sample Tests
When subjects are selected into the study because of
natural "links or associations", we want to analyze the
data together. This would occur in studies of parent-
infant interaction, romantic partners, siblings, or best
friends. In a study of parents and their children, a
parent’s data should be associated with his son's, not
some other child's. Subject matching also produces
dependent data. Suppose that an investigator wanted
to control for socioeconomic differences in research
subjects. She might measure socioeconomic status
and then match on that variable. The scores on the
dependent variable would then be treated as a pair in
the statistical test.
71. All statistical procedures for dependent or
correlated groups treat the data as linked,
therefore it is very important that you
correctly identify dependent groups
designs. The statistics that can be used for
correlated groups are the McNemar Test
(two samples or times of measurement),
Wilcoxon t Test (two samples), Dependent
Means t Test (two samples), Friedman
ANOVA for Ranks (three or more samples),
Simple Repeated Measures ANOVA (three
or more samples).
72. Independent Means
When there is no subject overlap across groups, we define
the groups as independent. Tests of gender differences are
a good example of independent groups. We cannot be
both male and female at the same time; the groups are
completely independent. If you want to determine
whether samples are independent or not, ask yourself,
"Can a person be in one group at the same time he or she
is in another?" If the answer is no (can't be in a remedial
education program and a regular classroom at the same
time; can't be a freshman in high school and a sophomore
in high school at the same time), then the groups are
independent.
73. The statistics that can be used for
independent groups include the chi-
square test of independence (two or
more groups), Mann-Whitney U Test
(two groups), Independent Means t
test (two groups), One-Way Between-
Groups ANOVA (three or more
groups), and Factorial ANOVA (two or
more independent variables).
74. Scales of Measurements
Once we have identified the independent
and dependent variables, our next step in
choosing a statistical test is to identify the
scale of measurement of the variables.
All of the parametric tests that we have
learned to date require an interval or ratio
scale of measurement for the dependent
variable.
75. Scales of Measurements
If you are working with a dependent
variable that has a nominal or ordinal
scale of measurement, then you must
choose a nonparametric statistic to
test your hypothesis
76.
77. How many Samples / Groups are in the
Design
Once you have identified the scale of
measurement of the dependent variable,
you want to determine how many samples
or "groups" are in the study design.
Designs for which one-sample tests (e.g.,
Z test; t test; Pearson and Spearman
correlations; chi-square goodness-of-fit)
are appropriate to collect only one set or
"sample" of data.
78. How many Samples / Groups are in the
Design
There must be at least two sets of
scores or two "samples" for any
statistic that examines differences
between groups (e.g. , t test for
dependent means; t test for
independent means; one-way ANOVA;
Friedman ANOVA; chi-square test of
independence) .
79. Parametric Tests
Parametric statistics are used when our
data are measured on interval or ratio
scales of measurement
Tend to need larger samples
Data should fit a particular distribution;
transformed the data into that particular
distribution
Samples are normally drawn randomly
from the population
Follows the assumption of normality –
meaning the data is normally distributed.
80. Parametric Assumptions
Listed below are the most frequently
encountered assumptions for parametric tests.
Statistical procedures are available for testing
these assumptions.
The Kolmogorov-Smirnov Test is used to
determine how likely it is that a sample came
from a population that is normally distributed.
81. Parametric Assumptions
The Levene test is used to test the assumption of
equal variances.
If we violate test assumptions, the statistic chosen
cannot be applied. In this circumstance we have
two options:
We can use a data transformation
We can choose a nonparametric statistic
If data transformations are selected, the
transformation must correct the violated assumption.
If successful, the transformation is applied and the
parametric statistic is used for data analysis.
82. Types of Parametric Tests
Z test
One-way ANOVA
One-Sample t test
Factorial ANOVA
t test for dependent means
Pearson’s r
t test for independent means
Bivariate/Multiple regression
83.
84. Non-Parametric TestsInference procedures which are likely
distribution free.
Nonparametric statistics are used when our
data are measured on a nominal or ordinal
scale of measurement.
All other nonparametric statistics are
appropriate when data are measured on an
ordinal scale of measurement.
Example to this is the sign tests. These are
tests designed to draw inferences about
medians.
85.
86. Types of Non-parametric Tests
Signed Tests
Chi-square statistics and their
modifications (e.g., McNemar Test) are
used for nominal data.
Wilcoxon Test – alternative to t – test in
the parametric test
Kruskal- Wallis Test - alternative to
ANOVA
Freidman Test – alternative to ANOVA
88. Choosing the Correct Statistical
TestsSummary
Five issues must be considered when
choosing statistical tests.
Scale of measurement
Number of samples/groups
Nature of the relationship between
groups
Number of variables
Assumptions of statistical tests