SlideShare a Scribd company logo
1 of 33
Applied Statistics
Part 5
By:
MM. H. Farjoo MD, PhD, Bioanimator
Shahid Beheshti University of Medical Sciences
Instagram: @bio_animation
Applied Statistics
part 5
 Outliers
 Transforming Data
 Normalizing Data
 Weighting Data
 Torturing Data
 Robustness
 Homoscedasticity and Heteroscedasticity
Outliers
 When analyzing data, sometimes one value is far
from the others; Such a value is called an outlier.
 With an outlier, consider these:
 Was the value entered into the computer correctly?
 Is the outlier value scientifically impossible? (negative
height or weight, etc)
 Were there any experimental problems caused by a flaw in
the Lab. devices?
 Could the outlier be caused by biological diversity? (this
may be the most exciting finding in your data!)
Outliers Cont,d
 Don't throw out the data as an outlier until first thinking
about whether the finding is scientifically interesting.
 You may have discovered a polymorphism in a gene, or a
new clinical syndrome.
 It is especially important to beware of lognormal
distributions.
 In lognormal distribution, you find very high values
which can easily be mistaken for outliers.
 Removing these values would be a mistake.
Outliers
Hands-on practice
 To find outliers in SPSS:
 Analyze => Descriptive Statistics => Explore... => Statistics...
=> Outliers check box
 To find outliers in Prism:
 Column statistics (from welcome screen) => frequency
distribution data and histogram => Analyze => Column
analysis => Identify outliers
Transforming Data
 Data transformations are an important tool for the proper
statistical analysis of biological data.
 It is tried if a quantitative variable:
 Does not fit a normal distribution
 Has greatly different SD in different groups (SD)
 It is NOT a form of playing around with your data in order to
get the answer you want!
 So it is essential to be able to defend data transformation.
Transforming Data Cont,d
 For transforming, a mathematical operation is performed
on each observation, and statistics is done on the
transformed numbers.
 It is better to use a transformation that other researchers
commonly use in your field.
 It is rather better to use a more common, but less effective
transformation, so people are not skeptical.
 Data don't have to be perfectly normal; parametric tests
aren't sensitive to this assumption.
Transforming Data Cont,d
 It is NOT a good idea to report your results (means,
SD, CI etc.) in transformed units.
 You should back-transform the results, and do the
opposite math function used for transformation.
 It is also important that you decide which
transformation to use before you do the statistical test.
 Trying different transformations until you find one
that gives you a significant result is cheating.
Transforming Data Cont,d
 Log transformation:
 Consists of taking the log of each observation.
 You can use either base-10 logs or base-e logs, It
makes no difference for a statistical test
 You should specify which log you're using, as it
affects the slope and intercept in a regression
 Many variables in biology have log-normal
distributions
 It means after log-transformation, the values are
normally distributed.
Transforming Data Cont,d
 Square-root transformation:
 Consists of taking the square root of each observation
 Arcsine transformation.
 Consists of taking the arcsine of the square root of a
number.
 The numbers must be in the range 0 to 1
 This is used for proportions, which range from 0 to 1
Transforming Data
Hands-on practice
 To transform data in SPSS:
 Transform => compute variables
 To transform data in Prism:
 Column statistics (from welcome screen) => frequency
distribution data and histogram => Analyze => Transform,
Normalize… => Transform => “Transform Y values using”
check box
Normalizing Data
 We often want to compare data on different scales or
even different units.
 To do so, we “eliminate” the scale of measurements,
and “constrain” them to predetermined restrictions.
 This is called normalization, and puts different
variables into comparable units.
 Investigators commonly normalize dose-response
curves so all curves begin, and end at constant values
(usually 0% & 100%).
Normalizing Data Cont,d
 To fit a curve to the normalized data, we “constrain”
the bottom and top plateaus to predetermined values
(usually 0% and 100%).
 In this way all parameters of the curves are
comparable (eg: EC50, slope, intercept, etc)
 If you normalize, don't also weight the data.
Normalizing Data
Hands-on practice
 To normalize data in SPSS:
 Transform => Analyze => Regression => Probit...
 To normalize data in Prism:
 Column statistics (from welcome screen) => frequency
distribution data and histogram => Analyze => Transform,
Normalize… => Normalize
Weighting Data
 The USA election candidates in 1936, were Alfred Landon,
and Franklin Roosevelt.
 The magazine “Literary Digest” had always correctly
predicted the results in 1920, 1924, 1928 and 1932.
 It surveyed 10 million people in 1936, and 2.4 million of them
responded (wow!).
 Literary Digest predicted, Landon is the winner, but Landon
failed and Roosevelt won (a louder wow!!).
 The sampling error was 19%, the largest ever in the USA.
 The magazine was so discredited that it folded 2 years later in
1938; after 48 years of brilliant circulation.
Alfred Landon
Franklin Roosevelt
Weighting Data (Cont’d)
 A sample may NOT be representative of its population.
 This happens because of:
 Non-response
 Self-selection (in an online survey)
 Sampling error (eg. selection bias) or just bad luck!
 A commonly applied correction technique is weighting.
 Under-represented variables, get a weight larger than 1, and
over-represented groups get a weight smaller than 1.
 In the calculations, not the variables proper, but the
weighted values are used.
Weighting Data (Cont’d)
 A weighting adjustment can only be carried if appropriate and
valid auxiliary variables are available.
 Gallup's Institute of Public Opinion correctly predicted the
result of the 1936 election using a sample size of only 50,000.
 The morals of the Literary Digest story is:
 Making a bad sample bigger, does NOT correct the sampling error
 A badly chosen big sample is much worse than a well-chosen small
sample
 Watch out for selection bias and nonresponse bias and correct them
by weighting
 Gallup institute was better than Literary Digest!!
Male Female
Population 50% 50%
Sample 20% 80%
What film? Action (Western) Drama (Indian)
Weight 2.5 (50 / 20) 0.625 (50 / 80)
Result in sample 50% (2.5 * 20%) 50% (0.625 * 80%)
Weighting Data
Gender is an auxiliary variable
Weighting Data
Hands-on practice
 To weight data in SPSS:
 Data => weight cases...
 To weight data in Prism (note: prism uses weighting
for nonlinear regression):
 XY (from welcome screen) => Nonlinear regression =>
Analyze => XY analysis => Nonlinear regression (curve fit)
=> choose an equation from “Fit” tab => Weight tab =>
Weighting method
Torturing Data
 When scientists don't get the results they want, they often
resort to tactics such as:
 Change the definition of the outcome.
 Use a different time scale.
 Try different criteria for including or excluding a subject.
 Arbitrarily decide which points to remove as outliers.
 Try different ways to clump or separate subgroups.
 Try different ways to normalize the data.
 Try different algorithms for computing statistical tests.
 Try different statistical tests.
 If the results are still 'negative', then don't publish them!!
Robustness
 Robustness in statistical tests means the test can
“resist” violation(s) of the test assumption(s).
 If a tests is robust, the result is not affected
considerably by the absence or alterations of the
condition (eg: normal distribution).
 It is similar to buffer solutions in chemistry which
resist against changes in pH.
Homoscedasticity & Heteroscedasticity
 Homoscedasticity & Heteroscedasticity are a notion
usually considered in ANOVA, and t Test
 Parametric tests assume that data are homoscedastic (have
the same SD in different groups).
 If the SDs are heteroscedastic, the probability of
obtaining a false positive is greater than alpha level.
 Heteroscedasticity is not a problem with balanced designs
(equal sample sizes in each group).
 You should always compare the SDs of the groups for
heteroscedasticity (especially with unbalanced designs).
Homoscedasticity & Heteroscedasticity Cont,d
 There is no agreement about when heteroscedasticity
is big enough for not using a test that relies on it.
 To test homoscedasticity, Bartlett's test is used (H0 is
desirable)
 Bartlett is not a very good test, so do not panic if it
returns a significant P value.
 When SDs are different The first action is data
transformation.
Homoscedasticity & Heteroscedasticity Cont,d
 Bartlett test is often used to compare the effect of
various transformations to obtain the biggest P value.
 If data transformation is not successful, Welch test
(correction) is used as an alternative.
 Welch test does not assume equal SDs.
 Non-parametric tests do not assume normal
distribution but they do assume homoscedasticity.
 So Non-parametric tests are NOT a good solution for
heteroscedasticity.
Thank you
Any question?

More Related Content

What's hot

Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Dr Bryan Mills
 
Lesson05_Static11
Lesson05_Static11Lesson05_Static11
Lesson05_Static11thangv
 
Hypothesis Testing in Six Sigma
Hypothesis Testing in Six SigmaHypothesis Testing in Six Sigma
Hypothesis Testing in Six SigmaBody of Knowledge
 
Spss paired samples t test Reporting
Spss paired samples t test ReportingSpss paired samples t test Reporting
Spss paired samples t test ReportingAmit Sharma
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi squareParag Shah
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1shoffma5
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
 
Testing hypothesis
Testing hypothesisTesting hypothesis
Testing hypothesisAmit Sharma
 
5. testing differences
5. testing differences5. testing differences
5. testing differencesSteve Saffhill
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection GuideLeanleaders.org
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statisticsGaurav Kr
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupNeelam Zafar
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSSParag Shah
 
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis Testing
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis TestingFoundations of Statistics for Ecology and Evolution. 2. Hypothesis Testing
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis TestingAndres Lopez-Sepulcre
 

What's hot (20)

Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)
 
Lesson05_Static11
Lesson05_Static11Lesson05_Static11
Lesson05_Static11
 
Hypothesis Testing in Six Sigma
Hypothesis Testing in Six SigmaHypothesis Testing in Six Sigma
Hypothesis Testing in Six Sigma
 
Spss paired samples t test Reporting
Spss paired samples t test ReportingSpss paired samples t test Reporting
Spss paired samples t test Reporting
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Test of significance
Test of significanceTest of significance
Test of significance
 
Testing hypothesis
Testing hypothesisTesting hypothesis
Testing hypothesis
 
4. correlations
4. correlations4. correlations
4. correlations
 
5. testing differences
5. testing differences5. testing differences
5. testing differences
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection Guide
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 
T test
T test T test
T test
 
Chapter10
Chapter10Chapter10
Chapter10
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSS
 
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis Testing
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis TestingFoundations of Statistics for Ecology and Evolution. 2. Hypothesis Testing
Foundations of Statistics for Ecology and Evolution. 2. Hypothesis Testing
 

Similar to Applied Statistics Part 5: Transforming, Weighting, and Torturing Data

Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB BiologyEran Earland
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.pptmousaderhem1
 
Inferential statistics quantitative data - single sample and 2 groups
Inferential statistics   quantitative data - single sample and 2 groupsInferential statistics   quantitative data - single sample and 2 groups
Inferential statistics quantitative data - single sample and 2 groupsDhritiman Chakrabarti
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notesBob Smullen
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variationleblance
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSSRajesh Gunesh
 
Statistice Chapter 02[1]
Statistice  Chapter 02[1]Statistice  Chapter 02[1]
Statistice Chapter 02[1]plisasm
 
Nonparametric tests assignment
Nonparametric tests assignmentNonparametric tests assignment
Nonparametric tests assignmentROOHASHAHID1
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - FinalBrian Lin
 
Advice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchAdvice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchNancy Ideker
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurementattique1960
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysisRajesh Mishra
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss Subodh Khanal
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 

Similar to Applied Statistics Part 5: Transforming, Weighting, and Torturing Data (20)

Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
 
Data science
Data scienceData science
Data science
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Inferential statistics quantitative data - single sample and 2 groups
Inferential statistics   quantitative data - single sample and 2 groupsInferential statistics   quantitative data - single sample and 2 groups
Inferential statistics quantitative data - single sample and 2 groups
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Statistice Chapter 02[1]
Statistice  Chapter 02[1]Statistice  Chapter 02[1]
Statistice Chapter 02[1]
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Nonparametric tests assignment
Nonparametric tests assignmentNonparametric tests assignment
Nonparametric tests assignment
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final
 
Advice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchAdvice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation Research
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurement
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Stat
StatStat
Stat
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences (20)

Drugs used in disorders of coagulation
Drugs used in disorders of coagulationDrugs used in disorders of coagulation
Drugs used in disorders of coagulation
 
Agents used in anemias hematopoietic growth factors
Agents used in anemias hematopoietic growth factorsAgents used in anemias hematopoietic growth factors
Agents used in anemias hematopoietic growth factors
 
Drugs used in dyslipidemia
Drugs used in dyslipidemiaDrugs used in dyslipidemia
Drugs used in dyslipidemia
 
Immunopharmacology
Immunopharmacology Immunopharmacology
Immunopharmacology
 
Management of the poisoned patient.
Management of the poisoned patient.Management of the poisoned patient.
Management of the poisoned patient.
 
Rational prescribing & prescription writing
Rational prescribing & prescription writingRational prescribing & prescription writing
Rational prescribing & prescription writing
 
Drug use in pregnancy and lactation part 2
Drug use in pregnancy and lactation part 2Drug use in pregnancy and lactation part 2
Drug use in pregnancy and lactation part 2
 
Drug use in pregnancy and lactation part 1
Drug use in pregnancy and lactation part 1Drug use in pregnancy and lactation part 1
Drug use in pregnancy and lactation part 1
 
Drug use in pregnancy and lactation part 3
Drug use in pregnancy and lactation part 3Drug use in pregnancy and lactation part 3
Drug use in pregnancy and lactation part 3
 
Drugs causing methemoglobinemia
Drugs causing methemoglobinemiaDrugs causing methemoglobinemia
Drugs causing methemoglobinemia
 
Drugs pharmacology in kidney disease
Drugs pharmacology in kidney diseaseDrugs pharmacology in kidney disease
Drugs pharmacology in kidney disease
 
Drugs pharmacology in liver disease
Drugs pharmacology in liver diseaseDrugs pharmacology in liver disease
Drugs pharmacology in liver disease
 
Drugs pharmacology in lung disease
Drugs pharmacology in lung diseaseDrugs pharmacology in lung disease
Drugs pharmacology in lung disease
 
Drugs pharmacology in heart disease
Drugs pharmacology in heart diseaseDrugs pharmacology in heart disease
Drugs pharmacology in heart disease
 
Academic writing 2nd part 6 bahman 1398
Academic writing 2nd part 6 bahman 1398Academic writing 2nd part 6 bahman 1398
Academic writing 2nd part 6 bahman 1398
 
Academic writing part 1
Academic writing part 1Academic writing part 1
Academic writing part 1
 
treatment of cardiac arrhythmias 2
treatment of cardiac arrhythmias 2treatment of cardiac arrhythmias 2
treatment of cardiac arrhythmias 2
 
antihypertensive agents 1
antihypertensive agents 1antihypertensive agents 1
antihypertensive agents 1
 
antihypertensive agents 2
antihypertensive agents 2antihypertensive agents 2
antihypertensive agents 2
 
Drugs pharmacology in lung disease
Drugs pharmacology in lung diseaseDrugs pharmacology in lung disease
Drugs pharmacology in lung disease
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 

Applied Statistics Part 5: Transforming, Weighting, and Torturing Data

  • 1. Applied Statistics Part 5 By: MM. H. Farjoo MD, PhD, Bioanimator Shahid Beheshti University of Medical Sciences Instagram: @bio_animation
  • 2. Applied Statistics part 5  Outliers  Transforming Data  Normalizing Data  Weighting Data  Torturing Data  Robustness  Homoscedasticity and Heteroscedasticity
  • 3. Outliers  When analyzing data, sometimes one value is far from the others; Such a value is called an outlier.  With an outlier, consider these:  Was the value entered into the computer correctly?  Is the outlier value scientifically impossible? (negative height or weight, etc)  Were there any experimental problems caused by a flaw in the Lab. devices?  Could the outlier be caused by biological diversity? (this may be the most exciting finding in your data!)
  • 4.
  • 5. Outliers Cont,d  Don't throw out the data as an outlier until first thinking about whether the finding is scientifically interesting.  You may have discovered a polymorphism in a gene, or a new clinical syndrome.  It is especially important to beware of lognormal distributions.  In lognormal distribution, you find very high values which can easily be mistaken for outliers.  Removing these values would be a mistake.
  • 6. Outliers Hands-on practice  To find outliers in SPSS:  Analyze => Descriptive Statistics => Explore... => Statistics... => Outliers check box  To find outliers in Prism:  Column statistics (from welcome screen) => frequency distribution data and histogram => Analyze => Column analysis => Identify outliers
  • 7. Transforming Data  Data transformations are an important tool for the proper statistical analysis of biological data.  It is tried if a quantitative variable:  Does not fit a normal distribution  Has greatly different SD in different groups (SD)  It is NOT a form of playing around with your data in order to get the answer you want!  So it is essential to be able to defend data transformation.
  • 8. Transforming Data Cont,d  For transforming, a mathematical operation is performed on each observation, and statistics is done on the transformed numbers.  It is better to use a transformation that other researchers commonly use in your field.  It is rather better to use a more common, but less effective transformation, so people are not skeptical.  Data don't have to be perfectly normal; parametric tests aren't sensitive to this assumption.
  • 9.
  • 10. Transforming Data Cont,d  It is NOT a good idea to report your results (means, SD, CI etc.) in transformed units.  You should back-transform the results, and do the opposite math function used for transformation.  It is also important that you decide which transformation to use before you do the statistical test.  Trying different transformations until you find one that gives you a significant result is cheating.
  • 11. Transforming Data Cont,d  Log transformation:  Consists of taking the log of each observation.  You can use either base-10 logs or base-e logs, It makes no difference for a statistical test  You should specify which log you're using, as it affects the slope and intercept in a regression  Many variables in biology have log-normal distributions  It means after log-transformation, the values are normally distributed.
  • 12. Transforming Data Cont,d  Square-root transformation:  Consists of taking the square root of each observation  Arcsine transformation.  Consists of taking the arcsine of the square root of a number.  The numbers must be in the range 0 to 1  This is used for proportions, which range from 0 to 1
  • 13. Transforming Data Hands-on practice  To transform data in SPSS:  Transform => compute variables  To transform data in Prism:  Column statistics (from welcome screen) => frequency distribution data and histogram => Analyze => Transform, Normalize… => Transform => “Transform Y values using” check box
  • 14. Normalizing Data  We often want to compare data on different scales or even different units.  To do so, we “eliminate” the scale of measurements, and “constrain” them to predetermined restrictions.  This is called normalization, and puts different variables into comparable units.  Investigators commonly normalize dose-response curves so all curves begin, and end at constant values (usually 0% & 100%).
  • 15.
  • 16. Normalizing Data Cont,d  To fit a curve to the normalized data, we “constrain” the bottom and top plateaus to predetermined values (usually 0% and 100%).  In this way all parameters of the curves are comparable (eg: EC50, slope, intercept, etc)  If you normalize, don't also weight the data.
  • 17.
  • 18. Normalizing Data Hands-on practice  To normalize data in SPSS:  Transform => Analyze => Regression => Probit...  To normalize data in Prism:  Column statistics (from welcome screen) => frequency distribution data and histogram => Analyze => Transform, Normalize… => Normalize
  • 19. Weighting Data  The USA election candidates in 1936, were Alfred Landon, and Franklin Roosevelt.  The magazine “Literary Digest” had always correctly predicted the results in 1920, 1924, 1928 and 1932.  It surveyed 10 million people in 1936, and 2.4 million of them responded (wow!).  Literary Digest predicted, Landon is the winner, but Landon failed and Roosevelt won (a louder wow!!).  The sampling error was 19%, the largest ever in the USA.  The magazine was so discredited that it folded 2 years later in 1938; after 48 years of brilliant circulation.
  • 21. Weighting Data (Cont’d)  A sample may NOT be representative of its population.  This happens because of:  Non-response  Self-selection (in an online survey)  Sampling error (eg. selection bias) or just bad luck!  A commonly applied correction technique is weighting.  Under-represented variables, get a weight larger than 1, and over-represented groups get a weight smaller than 1.  In the calculations, not the variables proper, but the weighted values are used.
  • 22. Weighting Data (Cont’d)  A weighting adjustment can only be carried if appropriate and valid auxiliary variables are available.  Gallup's Institute of Public Opinion correctly predicted the result of the 1936 election using a sample size of only 50,000.  The morals of the Literary Digest story is:  Making a bad sample bigger, does NOT correct the sampling error  A badly chosen big sample is much worse than a well-chosen small sample  Watch out for selection bias and nonresponse bias and correct them by weighting  Gallup institute was better than Literary Digest!!
  • 23. Male Female Population 50% 50% Sample 20% 80% What film? Action (Western) Drama (Indian) Weight 2.5 (50 / 20) 0.625 (50 / 80) Result in sample 50% (2.5 * 20%) 50% (0.625 * 80%) Weighting Data Gender is an auxiliary variable
  • 24.
  • 25. Weighting Data Hands-on practice  To weight data in SPSS:  Data => weight cases...  To weight data in Prism (note: prism uses weighting for nonlinear regression):  XY (from welcome screen) => Nonlinear regression => Analyze => XY analysis => Nonlinear regression (curve fit) => choose an equation from “Fit” tab => Weight tab => Weighting method
  • 26.
  • 27. Torturing Data  When scientists don't get the results they want, they often resort to tactics such as:  Change the definition of the outcome.  Use a different time scale.  Try different criteria for including or excluding a subject.  Arbitrarily decide which points to remove as outliers.  Try different ways to clump or separate subgroups.  Try different ways to normalize the data.  Try different algorithms for computing statistical tests.  Try different statistical tests.  If the results are still 'negative', then don't publish them!!
  • 28. Robustness  Robustness in statistical tests means the test can “resist” violation(s) of the test assumption(s).  If a tests is robust, the result is not affected considerably by the absence or alterations of the condition (eg: normal distribution).  It is similar to buffer solutions in chemistry which resist against changes in pH.
  • 29. Homoscedasticity & Heteroscedasticity  Homoscedasticity & Heteroscedasticity are a notion usually considered in ANOVA, and t Test  Parametric tests assume that data are homoscedastic (have the same SD in different groups).  If the SDs are heteroscedastic, the probability of obtaining a false positive is greater than alpha level.  Heteroscedasticity is not a problem with balanced designs (equal sample sizes in each group).  You should always compare the SDs of the groups for heteroscedasticity (especially with unbalanced designs).
  • 30. Homoscedasticity & Heteroscedasticity Cont,d  There is no agreement about when heteroscedasticity is big enough for not using a test that relies on it.  To test homoscedasticity, Bartlett's test is used (H0 is desirable)  Bartlett is not a very good test, so do not panic if it returns a significant P value.  When SDs are different The first action is data transformation.
  • 31. Homoscedasticity & Heteroscedasticity Cont,d  Bartlett test is often used to compare the effect of various transformations to obtain the biggest P value.  If data transformation is not successful, Welch test (correction) is used as an alternative.  Welch test does not assume equal SDs.  Non-parametric tests do not assume normal distribution but they do assume homoscedasticity.  So Non-parametric tests are NOT a good solution for heteroscedasticity.
  • 32.