This document provides a summary of statistical tests and procedures available in SPSS. It outlines descriptive statistics, pretests for normality, linearity and homoscedasticity, correlation, reliability analysis, Likert scale data, inferential statistics for nominal/ordinal and scale variables, nonparametric tests, and measures for categorical variables including chi-square tests, McNemar's test, and Cochran's Q test. Examples of research questions and hypotheses are provided for each statistical analysis.
Measure of dispersion has two types Absolute measure and Graphical measure. There are other different types in there.
In this slide the discussed points are:
1. Dispersion & it's types
2. Definition
3. Use
4. Merits
5. Demerits
6. Formula & math
7. Graph and pictures
8. Real life application.
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Measure of dispersion has two types Absolute measure and Graphical measure. There are other different types in there.
In this slide the discussed points are:
1. Dispersion & it's types
2. Definition
3. Use
4. Merits
5. Demerits
6. Formula & math
7. Graph and pictures
8. Real life application.
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.3: Estimating a Population Standard Deviation or Variance
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
📺Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.1: Estimating a Population Proportion
Abstract: This PDSG workshop introduces basic concepts of statistics. Concepts covered are mean (average), median, mode, standard deviation discrete vs. continuous, normal distribution, sampling distribution, Z-scores and boxplots.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Increasing Power without Increasing Sample Sizesmackinnon
This is an invited presentation I gave at a symposium "Making your research more reproducible" at the 27th Annual Conference of the Association for Psychological Science, New York. It talks about increasing statistical power without increasing sample size.
Topic: Coefficient of Variance
Student Name: Shakeela
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
Introduction to statistical concepts (population, sample, sampling, central tendency, spread). Mainly aimed at language teachers in advanced studies programmes (e.g., Masters courses)
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.3: Estimating a Population Standard Deviation or Variance
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
📺Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.1: Estimating a Population Proportion
Abstract: This PDSG workshop introduces basic concepts of statistics. Concepts covered are mean (average), median, mode, standard deviation discrete vs. continuous, normal distribution, sampling distribution, Z-scores and boxplots.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Increasing Power without Increasing Sample Sizesmackinnon
This is an invited presentation I gave at a symposium "Making your research more reproducible" at the 27th Annual Conference of the Association for Psychological Science, New York. It talks about increasing statistical power without increasing sample size.
Topic: Coefficient of Variance
Student Name: Shakeela
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
Introduction to statistical concepts (population, sample, sampling, central tendency, spread). Mainly aimed at language teachers in advanced studies programmes (e.g., Masters courses)
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
Answer the questions in one paragraph 4-5 sentences.
· Why did the class collectively sign a blank check? Was this a wise decision; why or why not? we took a decision all the class without hesitation
· What is something that I said individuals should always do; what is it; why wasn't it done this time? Which mitigation strategies were used; what other strategies could have been used/considered? individuals should always participate in one group and take one decision
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical distributions. In statistical terms, the sample meanfrom a group of observations is an estimate of the population mean. Given a sample of size n, consider n independent random variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean and standard deviation. The sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of items taken from a population. For example, if you are measuring American people’s weights, it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the weights of every person in the population. The solution is to take a sample of the population, say 1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are going to analyze. In statistical terminology, it can be defined as the average of the squared differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
· Determine the mean
· Then for each number: subtract the Mean and square the result
· Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
· Next we need to divide by the number of data points, which is simply done by multiplying by "1/N":
Statistically it can be stated by the following:
·
· This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each b.
Biostatistics - the application of statistical methods in the life sciences including medicine, pharmacy, and agriculture.
An understanding is needed in practice issues requiring sound decisions.
Statistics is a decision science.
Biostatistics therefore deals with data.
Biostatistics is the science of obtaining, analyzing and interpreting data in order to understand and improve human health.
Applications of Biostatistics
Design and analysis of clinical trials
Quality control of pharmaceuticals
Pharmacy practice research
Public health, including epidemiology
Genomics and population genetics
Ecology
Biological sequence analysis
Bioinformatics etc.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.3: Sampling Distributions and Estimators
This powerpoint presentation gives a brief explanation about the biostatic data .this is quite helpful to individuals to understand the basic research methodology terminologys
Similar to 1 main spss test summary 2020 ultimate (20)
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
1. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 1 OF 10
All you need in Data Analysis using SPSS
1.DESCRIPTIVE STATISTICS
When to use: when we need to summarize data using statistical
measures. It is used in cases where we have all the data (All society
data), results are 100% correct, no pre-assumption exist.
SCALE: DESCRIPTIVE: – Mean, Sum, Range, Max, Min stdev, skewness,
Kurtosis, Check outlier candidates using standardized values
EXPLORE: Mean, Sum, Range… Check Normality, Check Outlier
candidates using Box plots
CATEGORICAL: FOR NOMINAL/ORDINAL) – Frequency, Percentage of values
FREQUENCIES: For each variable alone, display percentage and count of
variable values, Bar chart, Pie chart or histogram
CROSSTAB: 2 or more intersected variables, display percentages and count
RATIO STATISTICS
Describe the ratio between two scale variables.
Example of research question: Is there good uniformity in the ratio between
the appraisal price and sale price of homes in each of five counties?
Output: Median, mean, coefficient of dispersion (COD), median-centered
coefficient of variation, mean-centered coefficient of variation, minimum
and maximum values, the concentration index computed for a user-specified
range or percentage within the median ratio.
We can determine
Which township's housing values have changed the most?
Median values closer to 1 has changed the least
Larger COD values indicate greater variability.
The within % of median coefficient of concentration (COC) measures variability,
it simply reports the percentage of values within a certain percentage of the
median. Larger values of this statistic indicate less variability.
2. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 2 OF 10
2.PRETESTS SUMMARY
(Normality, Linearity, Homocedasticity)
1. Testing Normality
In H0 assume that skewness and Kurtosis are equal to Zero
H0: The population (for variable x) is normally distributed.
Ha: The population (for variable x) is NOT normally distributed.
If Sig < = 0.05 (reject H0), Means Not Normally distributed
If Sig > 0.05 (don’t reject H0), Means Normally distributed
[[SSPPSSSS]] DDEESSCCRRIIPPTTIIVVEE SSTTAATTIISSTTIICCSS//EEXXPPLLOORREE then check normality plots with test in
plots button
[[SSPPSSSS]] NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE // KKOOLLMMOOGGOORROOVV--SSMMIIRRNNOOVV TTEESSTT
from settings, used to check Normal, uniform, exponential and poisson
distribution.
2. Testing Linearity
By using Simple Linear Regression y = aX + b
H0: a = 0 H0: The Slope of best fit line = 0
Ha: a ≠ 0 Ha: The Slope of best fit line ≠ 0
[[SSPPSSSS]] RREEGGRREESSSSIIOONN // LLIINNEEAARR
If Sig < = 0.05 (reject H0), Means Linear Relationship
If Sig > 0.05 (don’t reject H0), Means Not Linear Relationship
Or we could use the same test from comparing Means
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // MMEEAANNSS
The null hypothesis of correlation/linear regression is that the slope of the
best-fit line is equal to zero; in other words, as the X variable gets larger,
the associated Y variable gets neither higher nor lower.
3. Testing Homoscedasticity
The variability in scores for variable X should be similar at all values of
variable Y. it assumes that samples are obtained from populations of equal
variances.
[[SSPPSSSS]] GGEENNEERRAALL LLIINNEEAARR MMOODDEELL //MMUULLTTII VVAARRIIAATTEE // ((OOPPTTIIOONNSS//HHOOMMOOGGEENNEEIITTYY TTEESSTT))
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // IINNDDEEPPEENNDDEENNTT SSAAMMPPLLEESS TT TTEESSTT ((LLEEVVEENNEE’’SS TTEESSTT))
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE WWAAYY AANNOOVVAA ((HHOOMMOOGGEENNEEIITTYY OOFF VVAARRIIAANNCCEE TTEESSTT))
How : shortest method
Running Levene's test in SPSS, by using one way ANOVA, and checking Homogeneity
of variance test in options
H0: population variances are equal for x, to group1,group2
Ha: population variances are not equal for x ,to group1,group2
H0: population variances are equal between Read and Write
Ha: population variances are not equal between Read and Write
3. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 3 OF 10
3.CORRELATION
[[SSPPSSSS]] CCOORRRREELLAATTEE // BBIIVVAARRIIAATTEE
Example of research question: Is there a significant relationship between age
and optimism Scores, if yes what is its magnitude and direction?
Does optimism increase with age?
The null hypothesis (H0) and alternative hypothesis (Ha) of the significance test
for correlation can be expressed as follow
H0: ρ = 0 or the population corr.coefficient = 0; there is a significant correlation
Ha: ρ ≠ 0 or the population corr. coefficient ≠0; a nonzero correlation could exist
If Sig < = 0.05 (reject H0),
Means there is a significant Correlation between X and Y
If Sig > 0.05 (reject H0),
Means there is No significant Correlation between X and Y
Strength of correlation coefficient is explained as
Range Explanation Same for negatives
[0.0 – 0.3[ Not Significant ذكرُي ال 0 to -0.3
[0.3 – 0.5[ Weak ضعيف -0.3 to –0.5
[0.5 – 0.7[ Intermediate متوسط -0.5 to –0.7
[0.7 – 0.9[ Strong قوي -0.7 to –0.9
[0.9 – 1.0[ Very Strong جدا قوي -0.9 to -1
Small r=.10 to .29, Medium r=.30 to .49, Large r=.50 to 1.0
If correlation coefficient between X and Y is
Positive: It means, Increase the value of X will Increase the value of Y
Negative: It means, Increase the value of X will Decrease the value of Y
Zero: No correlation at all.
Correlation coefficient between a variable and itself (X and X) always = 1
Use Spearman correlation coefficient for 2 ordinal variables
Use Pearson correlation coefficient for 2 Scale variables
Kandell’s tau : used exactly as Spearman correlation coefficient
Phi: used to find correlation between 2 Nominal Variables each of 2 values
Cramers: used to find correlation between 2 Nominal Variables one of them
or both of more than 2 values
4.CHECKING RELIABILITY
[[SSPPSSSS]] SSCCAALLEE // RREELLIIAABBIILLIITTYY AANNAALLYYSSIISS
Cronbach Alpha measures internal consistency
Variables used to calculate Cronbach Alpha
All Variables related to our research
Exclude empty variables, One value variables, Serials, ID’s and similar
Cronbach alpha values can be quite small. In this situation it may be
better to calculate and report the mean inter-item correlation for the
items. Optimal mean inter-item correlation values range from
.2 to .4 (as recommended by Briggs & Cheek 1986).
4. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 4 OF 10
5.LIKERT SCALE
What is Likert Scale Data?
Evaluation on a 5 degree scale, 3 degree scale or any other Level
Average Explanation – 5 Level Scale
Range Meaning -ve Meaning +ve
[1.0 – 1.8[ Strongly Agree Strongly disagree
[1.8 – 2.6[ Agree Disagree
[2.6 – 3.4[ Neutral Neutral
[3.4 – 4.2[ Disagree Agree
[4.2 – 5.0] Strongly disagree Strongly Agree
Average Explanation – 3 Level Scale
Range Meaning
[1.00 – 1.66[ Agree
[1.66 – 2.33[ Neutral
[2.33 – 3.00[ disagree
5. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 5 OF 10
6.INFERENTIAL STATISTICS
When to use: when we have a sample and want to generalize result to a
population, it include error in generalization called alpha, we have a
hypothesis that want to reject or retain an assumption
Nominal/Ordinal Tests
One Sample Binomial Test (one categorical variable with 2 values only)
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: Is proportion of Female Spiders = 0.75
H0: proportion of female spiders = 0.75
Ha: proportion of female spiders≠ 0.75
When performing the test, value of H0 should be at first case
Chi Square goodness of fit Test (one categorical/discrete variable, each have
2 or more answers (values))
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: are students interested in different fields
equally
H0: The proportions of MIS, CIS and CS Students are equal
Ha: The proportions of MIS, CIS and CS Students are NOT equal
H0: Students are interested in MIS, CIS and CS equally
Ha: Students are interested in MIS, CIS and CS unequally
Could be used as
H0: there is no significant difference between the Current smart phone
proportion and preferred smart phone proportion that the students have.
Ha: there is a significant difference between the Current smart
phoneproportion and preferred smart phoneproportionthat the students have.
7.NONPARAMETRIC TESTS
One Sample Wilcoxon Signed Rank test (One sample median test) (one scale
variable)
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: Is there a significant difference between a sample
median and a hypothesized value.
Fisher’s exact test
The Fisher’s exact test is used when you want to conduct a chi-square test
but one or more of your cells have an expected frequency of five or less.
Remember that the chi-square test assumes that each cell has an expected
frequency of five or more, but the Fisher’s exact test has no such assumption
and can be used regardless of how small the expected frequency is
The Kruskal Wallis test
Is used when you have one independent variable with two or more levels and
an ordinal dependent variable. In other words, it is the non-parametric version
of ANOVA and a generalized form of the Mann-Whitney test method since it permits
two or more groups.
Other Categorical Tests/measures (used in Crosstabs)
6. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 6 OF 10
Chi Square Test of Independence (two categorical variables, each have 2 or
more values)
[[SSPPSSSS]] AANNAALLYYZZEE--DDEESSCCRRIIPPTTIIVVEE SSTTAATTIISSTTIICCSS––CCRROOSSSSTTAABBSS
Example of research question:
Are older people more optimistic than younger people?
Is there an association between gender and smoking behavior?
Are males more likely to be smokers than females?
Is the proportion of males that smoke the same as the proportion of females?
H0: X is independent of Y
(There is no significant association between x and y.
Ha: X is NOT independent of Y
(There is a significant association between x and y.
H0: Obesity is independent of eating Junk Meals
Ha: Obesity is NOT independent of eating Junk Meals
McNemar Test: (two categorical variables each have 2 values (Yes/No) measure
the same feature at 2 different times to see the effect of an Intervention)
Example of research question: Is there a change in the proportion of the sample
diagnosed with clinical depression prior to, and following, the intervention?
When you have matched or repeated measures designs (e.g. pre-test/post-
test), you cannot use the usual chi-square test. Instead, you need to use
McNemar’s Test. In the health and medical area this might be the presence or
absence of some health condition (0=absent; 1=present), while in a political
context it might be the intention to vote for a particular candidate (0=no,
1=yes) before and after a campaign speech.
H0: there is No significant change in the proportion of participants diagnosed as
clinically depressed prior to and following the program
H0: there is a significant change in the proportion of participants diagnosed as
clinically depressed prior to and following the program
Cochran’s Q TEST
The McNemar’s Test described in the previous section is suitable if you
have only two time points. If you have three or more time
points[categorical var], each with 2 values [yes,no] you will need to use
Cochran’s Q Test
Example of research question: Is there a change in the proportion of
participants diagnosed with clinical depression across the three time
points: (a) prior to the program, (b) following the program and (c) three
months post-program?
Three categorical variables measuring the same characteristic. (e.g.
presence or absence of the characteristic 0=no, 1=yes) collected from each
participant at different time points.
H0: there is No significant change in the proportion of participants diagnosed as
clinically depressed Prior program, Following program and three months later
H0: there is a significant change in the proportion of participants diagnosed as
clinically depressed Prior program, Following program and three months later
7. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 7 OF 10
Risk-(Odds-Ratio) (two categorical variables each have 2 values (Yes/No))
a measure of the strength of the association between the presence of a factor
and the occurrence of an event.(No Null Hypothesis)
Quantify how strongly the presence or absence of property A is associated
with the presence or absence of property B in a given population. If each
individual in a population either does or does not have a property "A"
It gives us information as
If you have lung cancer, you are 81% more likely to smoke than if you
didn’t have lung cancer.
If you have smoke, you are 81% more likely to have Lung Cancer than if you
didn’t smoke. بمقدار تزيد بالسرطان المدخن اصابة احتمالية18%المدخن غير عن
KAPPA MEASURE OF AGREEMENT: (Two categorical variables with an equal number of
categories) commonly used in the medical literature to assess inter-rater
agreement (e.g. diagnostic classification from Rater 1 or Test 1: 0=not
depressed, 1=depressed; and the diagnostic classification of the same person
from Rater 2 or Test 2) Or Diagnosis from two different clinicians
Example of research question: How consistent are the diagnostic classifications
of the Edinburgh Postnatal Depression Scale and the Depression, Anxiety and
Stress Scale?
Example of research question: Assumes equal number of categories from Rater 1 and
Rater 2.
Interpretation of output from Kappa
The main piece of information we are interested in is the table Symmetric
Measures, which shows that the Kappa Measure of Agreement value is .56, with a
significance of p < .0005. According to Peat (2001, p. 228), a value of .5 for
Kappa represents moderate agreement, above .7 represents good agreement, and
above .8 represents very good agreement. So in this example the level of
agreement between the classification of cases as depressed using the EPDS and
the DASS-Dep is good.
Nominal. For nominal data (no intrinsic order, such as Catholic, Protestant, and
Jewish), you can select Contingency coefficient, Phi (coefficient) and Cramér's
V, Lambda (symmetric and asymmetric lambdas and Goodman and Kruskal's tau),
and Uncertainty coefficient.
Contingency coefficient. A measure of association based on chi-square. The value
ranges between 0 and 1, with 0 indicating no association between the row and
column variables and values close to 1 indicating a high degree of association
between the variables. The maximum value possible depends on the number of rows
and columns in a table.
8.COMPARING MEANS FOR SCALE VARIABLES (FOR NORMALLY DISTRIBUTED DATA)
One Sample T Test (One scale variable)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE SSAAMMPPLLEE TT TTEESSTT
Example of research question: Is there a significant difference between the exam
score average and 70
H0: Average weight of herring’s body = 400 grams
Ha: Average weight of herring’s body ≠ 400 grams
8. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 8 OF 10
Independent Samples T Test (two variables, one scale test variable, one
discrete with only 2 values for grouping)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // IINNDDEEPPEENNDDEENNTT SSAAMMPPLLEESS TT TTEESSTT
Example of research question: Is there a significant difference in the mean self-
esteem scores for males and females?
H0: Average amount spent for males= Average amount spent for females
Ha: Average amount spent for males≠ Average amount spent for females
Paired Samples T Test (two scale variables, each measure the same feature, one
before and one after an action)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // PPAAIIRREEDD SSAAMMPPLLEESS TT TTEESSTT
Example of research question: Is there a significant effect of medicine on lowering
average blood sugar in blood.
Note: this test is called Dependent Samples t test, since both observations are
related, it is not necessary to have 2 observations before and after an event,
for example, we might investigate if average score of read = average score of
write or not using this test. Why? Since write depends on read.
H0: Average reaction time before drinking a beer = Average reaction time
after drinking a beer
Ha: Average reaction time before drinking a beer ≠ Average reaction time
after drinking a beer
One way ANOVA(one scale variable, one discrete with multiple values)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE WWAAYY AANNOOVVAA
Example of research question: Is there a difference in optimism scores for young,
middle-aged and old participants?
H0: Average Weight of parsley plants is equal among fertilizers used
Ha: Average Weight of parsley plants is not equal among fertilizers used
Simple Linear Regression (Two scale variables, one is independent (Input) and
the other is Dependent (Output))
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//LLIINNEEAARR
Example of research question: How much of the variance in life satisfaction scores
can be explained by self-esteem?
life satisfaction = a * self-esteem + b
Multiple Linear Regression (3 or more scale variables, one or more are
independent (Input) and one is Dependent (Output))
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//LLIINNEEAARR
Example of research question:
How much of the variance in life satisfaction scores can be explained by the
following set of variables: self-esteem, optimism and perceived control?
Which of these variables is a better predictor of life satisfaction?
life satisfaction = a1 * self-esteem + a2 * optimism+ a3*perceived control + b
9. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 9 OF 10
If data is not normally distributed, we should use other alternative
methods as Wilcoxon Signed Rank Test, Kruskal-Wallis Test, Friedman Test,
Mann-Whitney U Test and others
Parametric Technique NonParametric Technique
Independent-samples t-test Mann-Whitney U Test
Paired-samples t-test Wilcoxon Signed Rank Test
One-way between-groups ANOVA Kruskal-Wallis Test
One-way repeated-measures ANOVA Friedman Test
None Chi-square for goodness of fi t
None Chi-square for independence
None McNemar’ Test
None Cochran’s Q Test
None Kappa Measure of Agreement
Two-way analysis of variance (between groups) None
Mixed between-within groups ANOVA None
Multivariate analysis of variance (MANOVA) None
Analysis of covariance None
one-way between-groups ANOVA (one independent variable, one dependent variable)
Two-way analysis of variance (between groups) (two independent variables, one dependent
variable).
Binary Logistic (one Binary dependent variable, one or more independent
variables either scale or categorical)
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//BBIINNAARRYY LLOOGGIISSTTIICC
H0: the model is adequately fits the data
Ha: the model is not adequately fits the data
Example of research question:
A catalog company wants to increase the proportion of mailings that result in
sales.
A doctor wants to accurately diagnose a possibly cancerous tumor.
A loan officer wants to know whether the next customer is likely to default.
9.ROC CURVE
[[SSPPSSSS]] AANNAALLYYZZEE//RROOCC CCUURRVVEE
SSEENNSSIITT IIVVIITT YY: Power to identify positives
SSPPEECCIIFFIICCIITT YY: Power to identify negatives
FFAALLSSEE PPOOSSIITT IIVVEE RRAATT EE (Whole Model)(α) = FP / (FP + TN)
H0: using the predicted is better than guessing. True area=0.5
Ha: using the predicted is not better than guessing. True area≠0.5
10. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 10 OF 10
10.GRAPHICS AND PLOTS
CONTROL CHARTS
Control charts are a graphical aid for assessing variation in a manufacturing
process. By distinguishing between common and unusual variation, you can
determine whether a process is functioning normally or needs to be adjusted.
Q-Q PLOTS
Deciles 10, Quintiles 5, Quartiles 4, Terciles 3
Percentile = 100 parts, Median = 2 parts
Plots the Quantiles of a variable's distribution against the Quantiles of any
of a number of test distributions
is a graphical method for comparing two probability distributions by plotting
their quantiles against each other.
Quantiles : Values that divide the cases into some number of equal-sized groups.
If data is normally distributed, they will fall along diagonal line
[[SSPPSSSS]] AANNAALLYYZZEE//DDEESSCCRRIIPPTTIIVVEE//QQ--QQ PPLLOOTTSS
P-P PLOTS
Plots a variable's cumulative proportions, against the cumulative proportions
of any of a number of test distributions.
Probability plots are generally used to determine whether the distribution of a
variable matches a given distribution. If the selected variable matches the test
distribution, the points cluster around a straight line.
[[SSPPSSSS]] AANNAALLYYZZEE//DDEESSCCRRIIPPTTIIVVEE//PP--PP PPLLOOTTSS