A presentation meant for non-statisticians on statistics and general statistical analysis. Basically provides a short overview of the processes involved in data collection, storage, hypothesis generation and statistical analysis. It does not deal with bayesian statistics. Presented at PRODVANCE 2016 Ahmedabad
Hypothesis Testing is important part of research, based on hypothesis testing we can check the truth of presumes hypothesis (Research Statement or Research Methodology )
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
The present article helps the USA, the UK and the Australian students pursuing their business and marketing postgraduate degree to identify right topic in the area of marketing in business. These topics are researched in-depth at the University of Columbia, brandies, Coventry, Idaho, and many more. Stats work offers UK Dissertation stats work Topics Services in business. When you Order stats work Dissertation Services at Tutors India, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts.
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
Hypothesis Testing is important part of research, based on hypothesis testing we can check the truth of presumes hypothesis (Research Statement or Research Methodology )
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
The present article helps the USA, the UK and the Australian students pursuing their business and marketing postgraduate degree to identify right topic in the area of marketing in business. These topics are researched in-depth at the University of Columbia, brandies, Coventry, Idaho, and many more. Stats work offers UK Dissertation stats work Topics Services in business. When you Order stats work Dissertation Services at Tutors India, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts.
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
In Hypothesis testing parametric test is very important. in this ppt you can understand all types of parametric test with assumptions which covers Types of parametric, Z-test, T-test, ANOVA, F-test, Chi-Square test, Meaning of parametric, Fisher, one-sample z-test, Two-sample z-test, Analysis of Variance, two-way ANOVA.
Subscribe to Vision Academy for Video assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Factor analysis in marketing research intentions to designate a large number of variables or questions by using a reduced set of underlying variables, called factors. Factor analysis is unsurpassed when cast-off to simplify complex data sets with many variables
A summary of recent innovations in radiation oncology focussing on the priniciples of different techniques and their application. An overview of clinical results has also been given
In Hypothesis testing parametric test is very important. in this ppt you can understand all types of parametric test with assumptions which covers Types of parametric, Z-test, T-test, ANOVA, F-test, Chi-Square test, Meaning of parametric, Fisher, one-sample z-test, Two-sample z-test, Analysis of Variance, two-way ANOVA.
Subscribe to Vision Academy for Video assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Factor analysis in marketing research intentions to designate a large number of variables or questions by using a reduced set of underlying variables, called factors. Factor analysis is unsurpassed when cast-off to simplify complex data sets with many variables
A summary of recent innovations in radiation oncology focussing on the priniciples of different techniques and their application. An overview of clinical results has also been given
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...Santam Chakraborty
A journal club presentation comparing and contrasting the EORTC and RTOG trials of concurrent chemoradiation in Head Neck Cancers in the post operative setting.
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...Santam Chakraborty
Small Presentation where the benefit of addition of induction / neoadjuvant chemotherapy to concurrent chemoradiation in head neck cancers is explored.
An introduction on how to go about a meta-analysis. Primarily designed for people with non statistical background. Heavily borrows from Cochrane Handbook of Systematic Reviews of Interventions.
LDR and HDR Brachytherapy: A Primer for non radiation oncologistsSantam Chakraborty
A small presentation I made for a 30 minutes class comparing and contrasting LDR and HDR brachytherapy. Good for a person with non radiation oncology background to grasp the basics.
This presentation is meant to help choose the appropriate statistical analysis for IBDP Biology IAs. It was created as support for teachers but also useful for students.
Within the presentation, we discuss different types of biological data, and how to describe and analyse it using mathematics.
LAB 2:
Descriptive Statistics
1
Descriptive statistics are numerical estimates that organize and sum up or present the data.
For quantitative variables (scale)
Mean with Standard deviation are used to summarize non-skewed scale variables
Median with range or interquartile range are used to summarize skewed scale variables
The three steps to evaluate the normality assumption are:
Compare the statistics values ( mean versus median)
Obtain the histogram with normal curve
Obtain the Box-Whiskers plot
For this class,
If there is any extreme outliers, median with range should be used to summarize the variable of interest
If there is any outliers (regular outliers), you need to based your decision regarding the best measure (mean with SD or median with range) to summarize the variable of interest on the shape of the histogram
Introduction
2
For qualitative variable (nominal or ordinal)
Frequency distributions (number with percentages) are used to summarize qualitative variables
Descriptive statistics for multiple groups:
Use Split file option in SPSS to obtain the measures of central tendency and the measures of variation for quantitative variables.
After you split your file by the grouping variable, you should follow the previous steps to select the most appropriate measures to summarize your variable of interest.
Please note that you have to un-split the data before running further analysis
Use Crosstabs option in SPSS to obtain the frequency distributions (number with percentages) for the qualitative variables
Introduction
3
4
Types of variables
Continuous (Quantitative) Variables
Qualitative (Categorical) Variables
Nominal/ Ordinal
Interval/Ratio
Number and Percent
N (%)
Normal distribution
1- Statistics {Mean and Median}
2- Histogram with Normal Curve
3- Box-Whiskers Plot
No
Median with Range
Yes
Mean with Standard Deviation
Box-Whiskers Plot
5
6
7
Box-Whiskers Plot
Source: http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_boxplot_sect017.htm
8
Extreme outliers (values greater than 3 IQR from Q1/Q3)
Outliers (values between 1.5 and 3 IQRs from Q1/Q3)
Whisker extends to furthest observation within Q3 + 1.5*IQR
Whisker extends to furthest observation within Q1 - 1.5*IQR
9
Example
Types of Variables
Procedures
Un-split the data before running further analysis
Descriptive Statistics for Multiple Groups
10
Split File
Qualitative or Categorical Variable (Grouping variable)
Crosstabs
Qualitative or Categorical Variable
Qualitative or Categorical Variable
Continuous or Quantitative Variable
Men (Gender)
Baseline Pulse
Females (Gender)
Widowed (Marital Status)
Example:
Following is a dictionary for a data set. The data collected on a number of people from Cornwall, Ontario, Canada who at ...
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
Statistical Methods for Data Analysis (Only Theory), Meaning of Interpretation, Technique of Interpretation, Significance of Report Writing, Steps, Layout of Research Report, Types of Research Reports, Precautions while writing research reports
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxketurahhazelhurst
Chapter 19
Basic Quantitative Data Analysis
Data Cleaning
Check for odd symbols, truncated or overlong times
Recheck scoring
Recheck coding categories
Compare one variable value with value in second variable
Look for outliers
2
Reasons for Missing Data
Participant skipped item or questionnaire, purposely or inadvertently
Participant withdrew, became ill, or died
Had to omit all or part of the data collection
Poor directions or poorly worded question
Data missed during data entry
3
Categorizing Missing Data
Missing completely at random (MCAR)
Missing at random (MAR)
Missing not at random (MNAR)
4
Replacing Missing Data
Complete case analysis is when you drop any participant from the analysis when they have missing data
If a lot of participants are missing data it may negatively impact the results
5
Replacing Missing Data
Principles in handling missing data are:
Some missing data cannot be replaced
Imputation uses existing information to estimate the missing values
The easiest approach is to replace missing data with the group’s mean (average) on the item
6
Replacing Missing Data
Principles in handling missing data are:
A more justifiable approach is to use the average of the individual participant’s scores or ratings on the remaining items of a multi-item scale
Missing values may be estimated from values at previous time points
7
Replacing Missing Data
Principles in handling missing data are:
Incomplete cases (participants) may be deleted and the analysis may be done on those who completed the study
A regression imputation may be done to estimate the values of the missing data
8
Replacing Missing Data
Principles in handling missing data are:
Expectation maximization uses a series of iterations to reach convergence
Multiple imputation contrasts and combines replacement values to find the best estimates
9
Visual Representations
Stem and leaf illustrates distribution of values
Box plots illustrate distribution of values
Bar and pie charts demonstrate differences between groups and subgroups
Plots can show relationships between interval level variables
10
Basic Descriptive Statistics
Normal distribution is represented by a symmetrical bell-shaped curve
Positive skew has more cases at low end of values
Negative skew has more cases at high end of values
11
Basic Descriptive Statistics
Mode is the value that occurs most often
Median is the middle score in the distribution
Mean is the average of all scores
12
Basic Descriptive Statistics
Range is the distance between the highest and lowest scores
The range or distance between these endpoints can be divided into various portions
13
Basic Descriptive Statistics
Variance is the average of the squared deviations from the mean
Standard deviation is the square root of the variance
14
Bivariate Association
Bivariate refers to relationships between a set of variables
Pearson product moment correlation coefficient represented as r is the most c ...
Need a nonplagiarised paper and a form completed by 1006015 before.docxlea6nklmattu
Need a nonplagiarised paper and a form completed by 10/06/015 before 7:00pm. I have attached the documents along the rubics that must be followed.
Coyne and Messina Articles, Part 2 Statistical Assessment
Details:
1) Write a paper of 1,000-1,250 words regarding the statistical significance of outcomes as presented in Messina's, et al. article "The Relationship between Patient Satisfaction and Inpatient Admissions Across Teaching and Nonteaching Hospitals."
2) Assess the appropriateness of the statistics used by referring to the chart presented in the Module 4 lecture and the resource "Statistical Assessment."
3) Discuss the value of statistical significance vs. pragmatic usefulness.
4) Prepare this assignment according to the APA guidelines found in the APA Style Guide located in the Student Success Center. An abstract is not required.
5) This assignment uses a grading rubric. Instructors will be using the rubric to grade the assignment; therefore, students should review the rubric prior to beginning the assignment to become familiar with the assignment criteria and expectations for successful completion of the assignment.
Statistics: What you Need to Know
Introduction
Often, when people begin a statistics course, they worry about doing advanced mathematics or their math phobias kick in. Understanding that statistics as addressed in this course is not a math course at all is important. The only math you will do is addition, subtraction, multiplication, and division. In these days of computer capability, you generally don't even have to do that much, since Excel is set up to do basic statistics for you. The key elements for the student in this course is to understand the various types of statistics, what their requirements are, what they do, and how you can use and interpret the results. Referring back to the basic components of a valid research study, which statistic a researcher uses depends on several things:
·
The research question itself
·
The sample size
·
The type of data you have collected
·
The type of statistic called for by the design
All quantitative studies require a data set. Qualitative studies may use a data set or may use observations with no numerical data at all. For the purposes of the next modules, our focus will be on quantitative studies.
Types of Statistics
There are several types of statistics available to the researcher. Descriptive statistics provide a basic description of the data set. This includes the measures of central tendency: means, medians, and modes, and the measures of dispersion, including variances and standard deviations. Descriptive statistics also include the sample size, or "N", and the frequency with which each data point occurs in the data set.
Inferential statistics allow the researcher to make predictions, estimations, and generalizations about the data set, the sample, and the population from which the sample was drawn. They allow you to draw inferences, generaliza.
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
This workshop was held in Yaounde, Cameroon on 24th March 2017 as part of the 'Towards Elimination of Schistosomiasis: A Paradigm Shift' Conference organised by Prof. Louis Albert Tchuem Tchuente, Director of the Centre for Schistosomiasis and Parasitology.
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docxadolphoyonker
Will Emerson pt
Empressed
BP-156/60
O2 – 92% on 2L
RR- 34
HR – 110
Temp- 103 oral
Pain – 4/10; Located all over; chest and back area mainly, feels like an elephant is sitting on him. Nothing relieve the pain. No pain meds taken. Allergic to penicillin and morphine sulfate. Takes Xopenex daily. Not feeling lightheaded or dizzy, feels ok while sitting, doesn’t want to walk anywhere. Fall risk.
General appearance – anxious and unkept, breathing at 34bpm, disheveled, weak and tired.
Skin – pale, hot, clammy, diaphoretic. Clubbing noted, cap refill <3 secs, 1+ pitting edema lower extremities.
Heart – 110 tachy, S1S2 heard in all points, no mummers present, radial, dorsalis pedis, posterior tibialis pulses are weak bilaterally 1+. Carotid 3+. 1+ edema in lower extremities.
2nd intercoastal right of sternal boarder aortic
2nd intercoastal space left of boarder pulmonic
3rd intercoastal space left of boarder Erb’s point
4th intercoastal space
Respiratory –
History of COPD, use of assessor muscles, SOB, on O2 @ 2L pm
Anterior lung sounds; RUL, LUL, RML, LLL, RLL
posterior - RUL, LUL, LLL, RLL, LLL, RLL
Adventitious rhonchi sounds heard in all fields, barrel chested in appearance, pt shows signs of acute distress.
GI- weighs 70kg, and 175cm tall and belly is rounded soft and no pain on palpation, hyperactive bowel sounds heard in all quadrants.
GU – voiding dark yellow urine
Extremities – generalized weakness, not out of bed
Skin – no rashes, clubbing present and cap refill less than 3 seconds, pale, skin is hot and diaphoretic.
Neuro - alert and oriented x 4, pupils, equal, round, reactive to light and accommodation, EOM’s are intact, behavior is anxious, speech is clear and coherent.
Lymph nodes – Lymph nodes are enlarged and tender.
Title slide: delete this and enter your title slide information. See Guidelines.
Professor XXXXXXXX
Pathophysiology review
Medical Diagnoses
Assessment Data
Student:___________________
Date: ______________________
Patient initials: ______W.E._____
Age: _____ Male / Female
Nursing Diagnosis related to --- as evidence by---
Nursing Diagnosis related to --- as evidence by---
Cite Sources using APA additional page
Measurable Expected Outcomes
Measurable Expected Outcomes
Intervention
Intervention
Intervention
Intervention
Intervention
Intervention
Nursing Diagnosis related to --- as evidence by---
Measurable Expected Outcomes
Add as many intervention boxes as needed
Evaluation
Evaluation
Evaluation
Evaluation
Use as many pages as needed to complete your concept map
Communication
Infection Control
Safety
Assessment/Patho
Use this if needed for more room
Reference
NR226 Fundamentals – Patient Care
RUA: Concept Map Guidelines
NR226_RUA_Concept_Map _Guideline_V2.docx Revised: MAR20 1
Purpose
This assignment is designed to extend the learner’s use of concept mapping as a tool for clinical care planning. The
nursing process continues to provide the foundatio.
The seven basic tools of quality is a designation given to a fixed set of graphical techniques identified as being most helpful in troubleshooting issues related to quality.They are called basic because they are suitable for people with little formal training in statistics and because they can be used to solve the vast majority of quality-related issues.
Similar to Refresher in statistics and analysis skill (20)
A non technical overview of sample size calculation and why it is necessary with some brief examples of how to approach the problem and why it is useful to actually think of these calculations.
A short overview of Image Guided Radiotherapy process in Lung Cancer presented at TMC Kolkata circa 2016. Basic principles and concepts as well as examples are outlined.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Statistics - A subject which
most statisticians find
difficult but in which nearly
all physicians are expert.
- Stephen Senn, Statistical Issues in Drug Development
3. What you will find in this
presentation
● Only 1 calculation
● Only 1 formula
● Lots of Cartoons & Quotes !!!
8. Key Points
● Converting quantitative data to qualitative data
is not advisable as it leads to data loss.
● QoL data is always qualitative but analyzed
often as quantitative data
● Most medical researchers gather both
qualitative & quantitative data but disregard
qualitative data
12. Collecting Data
● This is the most neglected yet most vital
part of the process.
● A structured way to collect data - Form
● Data collection instruments :
○ Surveys
○ Interviews
○ Focus Groups
13. Form Design Principles
● Be consistent in choice of font and layout
● Use checkboxes instead of allowing people to circle
answers.
● Provide visual cues to the format of data required.
● Instructions should be given in bold and italics
● Specify units of measurement and decimal places
● Use skips sparingly and clearly indicate locations
● Use precoded answer (e.g. Male / Female)
18. Databases : Advantages
● Allow multi-user access
● Respect data integrity
● Allow data validation
● Avoid data redundancy
● Allow flexible and customized queries
19. Databases : Disadvantages
● More difficult to learn
● May require an understanding of networking
related concepts
● Software maintenance and updates are an issue.
● Have a clear idea of the information that needs to
be included.
● Form design is required.
20.
21. Spreadsheet Tips
1. Header row should be in the first row only. Don't make
fancy 2/3 row headers.
2. Set the locale to UK / India if you are planning to use
DD/MM/YYYY as the date scheme
3. Freeze the first row and first column to ease data entry.
4. Use conditional formatting to pick up mistakes while
doing data-entry.
5. Avoid extensive code books - it is easier to recode data
6. Use different sheets sparingly.
22. Spreadsheet Tips
1. Remember excel is not a relational database - so donot
use the sort option.
2. If using the sort option select all the coloumns before
using sort
3. If you use a formula during data entry make the cell
protected or hidden to avoid inadvertent changes
4. Stick to a case “UPPERCASE” or “lowercase”.
23. SPSS Tips
1. Never forget to use variable labels. Setting this at
design stage ensures that everyone remembers
what is to be entered.
2. Value labels are your friend - dont use this
sparingly.
3. Ensure that the data - type is chosen appropriately.
24. Resources
1. Disciplined use of spreadsheets for data entry :
http://www.reading.ac.uk/ssc/resource-packs/ILRI_2006-Nov/GoodStati
sticalPractice/publications/guides/topsde.html
2. Using an Excel data entry form :
https://www.pryor.com/blog/ease-the-pain-of-data-entry-with-an-excel-
forms-template/
3. SPSS data entry tips : https://www.youtube.com/watch?v=N-krh4EaELE
26. A Statistical Analysis
Plan (SAP) is the
starting point of
your analysis
Tip
If you are at a loss when
it comes to writing your
SAP write the paper
results - it will help you
to visualize the analysis
plan.
27. Elements of a SAP
Define the research
hypothesis
Define
the end-
points
Define the
Statistical
methods
28. Research Hypothesis
1. Derives from the research question
2. Equally important for prospective or
retrospective studies.
3. Helps in choosing the correct endpoints for
the objectives appropriate to the hypothesis.
4. Often helps us to understand our underlying
motivation for the research
29. Research Question
A question that is designed to address a “perceived”
gap in the current state of knowledge about a
condition.
“I want to know how many new patients are seen by my
colleague instead of me”
“I want to know how many patients survive for 5 years
after coming to me”
30. PICO(T)
1. Population - To be defined for all studies
2. Intervention - Essential if you want to study the
effect of an intervention
3. Comparison Groups - Essential if you want to
define the benefit of an intervention
4. Outcome - To be defined for all studies
5. Time - Essential if a time to event endpoint is
chosen.
31. P New Patients presenting to my
hospital
New Patients presenting to my
Hospital
I Undergo a Consultation Treatment given by me
C Colleague or Me -
O Number of patients Survive their disease
(T) Over the last week Till 5 years
See other great examples of PICOs formulated from daily practice questions at PICO examples
provided by the Cochrane Library :
http://learntech.physiol.ox.ac.uk/cochrane_tutorial/cochlibd0e187.php
32. Always do a
systematic review
after formulating
the PICO
Tip
The Cochrane
Handbook is a great way
to understand the
systematic review
process
http://training.cochrane.
org/handbook
33. Alpha and Beta
1. Our research question is defined with the perspective of
the population but we can rarely study that.
2. The value of an observation in a representative and
random sample is considered to approximate the
population value.
3. Repeated samples from the same population will likely
yield different results for this value.
4. Alpha and Beta are measures of this uncertainty.
34. Researcher’s Decision
Reject Null Hypothesis Retain Null Hypothesis
Reality
Null
Hypothesis
is True
Type I Error (probability
of this occurring =
Alpha)
Correct
Null
Hypothesis
is False
Correct Type II Error (Probability of
this occurring is beta)
38. Before the Analysis
1. Ensure that you make a folder for the data file and take a
backup
2. If analyzing in SPSS ensure that the SPSS viewer file is
saved in the same folder
3. Ensure that the file version is correct if you have used
multiple versions of the same file.
4. Turn off the distractions and turn on some light music.
39. Describe the data
Always start with descriptives
1. Frequencies for Qualitative Variables
2. Mean and SD for Quantitative Variables.
3. Check for missing values
4. Check for outliers (graphs)
40. Measures of Central
Tendency
1. Mean : Heavily influenced by atypical values
2. Median: Heavily influenced by ties. Median is
also not amenable to further calculation and
rarely used in statistical procedures.
3. Mode : Also susceptible to ties. But the only
type of central tendency for nominal data.
41. Measures of Central Tendency
When do we prefer the median?
1. Extreme scores in the distribution
2. Count or ordinal measures
3. Some of the scores are undetermined
In case of skewed data / bimodal distribution it is better to
report the median and the trimmed mean.
42. Quantiles
● These are measures of variability as well as central
tendency. Each quantile has the same number of
observations.
● Median can be conceptualized as the 50% quantile
● Tertile: Split by 33% (3 parts)
● Quartile : Split by 25% (4 parts)
● Quintile : Split by 20% (5 parts)
● Decile : Split by 10% (10 parts)
43. Measures of Spread
● Range : Not useful when you have extreme values
● Interquartile Range : Usually reported along with median
- range between 25th - 75th quartile
● Standard deviation and Variance : Useful if the
distribution is symmetric
● 95% confidence interval of mean technically is a
measure of how closely your sample mean approximates
the “unknown” population mean. In case of normal
distribution this corresponds to ±1.96 standard deviation
45. Data Distribution
1. Binary / Nominal / Ordinal : Frequencies of
categories
2. Continuous Variable:
a. Histogram
b. Cumulative Histogram
c. Quantiles
d. Moments (measures of central tendency & skewness)
3. Skewed data : Nonparametric methods of analysis
(i.e. methods that do not assume that the
distribution is normal).
46. Density Plots & Histograms
Quick R: Histograms & Density Plots : http://www.statmethods.net/graphs/density.html
48. Bar Charts : Best Practices
1. Give the count if your Y axis is in percentages
2. Start the Y axis from 0
3. Try to arrange categories by frequency
4. Use a consistent color scheme - dont use different
colors in the bars unless they represent different
categories.
5. Avoid stacked bar charts unless you want to show
part to whole relationships
6. Space between bars = 1/2 of the bar width
51. Missing Values
Missing Completely at Random (MCAR) : Missingness of a value is not
dependant on another variable (e.g. randomly patients forget to answer some
QOL items)
Missing at random (MAR) : Missingness of a value is dependant on another
variable (e.g. patients presenting in late afternoon do not fill QOL forms)
Missing not at random (MNAR) : Missingness depends on a particular
characteristic inherent in the variable (e.g. only patients with poor QOL do not
fill QOL forms).
52. Missing Values
1. Deletion methods : In this some form of the data is
deleted. Most common approach used in SPSS is listwise
deletion. Alternative is pairwise deletion.
2. Single Imputation: Most common method is mean /
median substitution. Alternatively dummy coding can be
used especially if a categorical variable.
3. Model based Imputation : Multiple imputation and
maximum likelihod based methods.
53. Missing Values
List wise Deletion Pairwise Deletion
Effect on Sample Size Reduced Mostly remains same
Effect on Power Reduced Mostly remains same
Simplicity Yes Yes
Model comparison Yes No
Bias if MCAR Yes Yes
55. Resources
1. How to diagnose the missing data mechanism:
http://www.theanalysisfactor.com/missing-data-mechanism/
2. Missing data : Pairwise and Listwise Deletions which to use :
http://www-01.ibm.com/support/docview.wss?uid=swg21475199
3. Missing data and how to deal with it ( A nice presentation) :
https://liberalarts.utexas.edu/prc/_files/cs/Missing-Data.pdf
58. Hypothesis testing
1. Formal testing if the null hypothesis is untrue i.e. disprove
the null hypothesis
2. The null hypothesis is equivalent to a straw man - a sham
argument set up to be defeated.
3. The type of “tail” depends on the nature of the alternate
hypothesis
Failure to reject the null hypothesis is not the proof of it’s truth - in
other words absence of evidence is not evidence of it’s absence
59. Hypothesis testing : Tails
● Bill gates is earning the same $$ per month as
me - H0
● Bill gates is earning less $$ per month than me -
H1
(one tailed)
● The $$ that Bill Gates earns is different from
what I earn - H1
(two tailed)
60. Classifications of “significant" or “highly significant"
are arbitrary, and treating a P-value between 0.05
and 0.1 as indicating a “trend towards significance"
is bogus. If the P-value is 0.08, for example, the
0.95 confidence interval for the effect includes a
“trend” in the opposite (harmful) direction.
- Harrell & Slaughter (2016)
62. T Test
1. Basically independent sample T - test tests the null
hypothesis that the two samples are coming from two
populations whose means are same.
2. The paired T test tests the special null hypothesis that
the difference between two related means is 0.
63. Requirements
● Data needs to be quantitative
● It is obtained from a simple random sample*
● Data is normally distributed
● Variances of the two samples need to be same.
64. Comparing Proportions
1. Chi Square test:
a. Compare dichotomous outcomes in 2 groups
b. 2 x 2 contingency tables
c. Unreliable if count in one cell < 5
d. Yates continuity correction required if cell frequency < 10
2. Fisher’s exact test
a. Exact test as exact p value calculated - not approximate from chi
square table - also more conservative estimate
b. Can do larger contingency tables
c. More computationally intensive
d. Does not have a quantity analogous to the Chi Square statistic
65. Odds Ratio
1. Measure of association between an outcome and exposure
2. Ratio of odds of the outcome in exposed to the odds of the outcome in non
exposed.
3. Can be easily obtained from a 2 x 2 contingency table.
Dead Alive
RT 10 100
No RT 5 10
66. Risk Ratio
1. Another measure of relative effect size
2. Ratio of risk of outcome in exposed to the risk of outcome in non exposed.
3. Can be easily obtained from a 2 x 2 contingency table.
Dead Alive
RT 10 100
No RT 5 10
67. Odds vs Risk
1. Odds is the ratio of the probability of an event occurring to
that of not occurring - in this case odds of dying in the RT
group is
2. Risk is the probability of an event occurring - in this case the
risk of dying in the RT group is 10/110.
Dead Alive
RT 10 100
No RT 5 10
68. Why Odds Ratio
1. Risk ratios are easier to interpret but applicable to a
limited range of prognoses - e.g. a risk factor that
doubles the risk of developing lung cancer cannot
apply to a patient whose baseline risk is 0.5.
2. It reduces the effect size in large studies as
compared to risk ratios - more conservative.
3. Confidence intervals of ORs can be calculated
69. Non Parametric Methods
1. Actually better than parametric alternatives as they do not need checking of
distributional assumptions
2. Response variable can be interval / ordinal - do not need any
transformations to account for non normal distributions and can handle
extreme values better
3. Being less susceptible to extreme values these are considered more robust
70. Nonparametric test alternatives
1. One Sample T test - Wilcoxon Signed Rank test
2. Two sample T test - Wilcoxon 2-sample Signed Rank Test
(Mann Whitney test)
3. ANOVA - Kruskal Wallis Test
4. Pearson test for Correlation - Spearman rho test
71. Correlation
1. A method to examine the association between a
continuous predictor and a continuous outcome.
2. A correlation coefficient can range between -1 to
+1 and measures the strength of association as
well as the direction.
3. Scatterplots are a graphical method for
evaluating correlation.
72. Pearson’s Correlation
1. Requires linear relationship between the two variables.
2. Requires that the variables be normally distributed - ideally bivariate
normality.
3. Outliers have a big impact on the correlation.
73. Spearman’s Correlation
1. The non parametric alternative - does not require the distribution of
variables to be normal.
2. Does not assume a linear relationship but a monotonic relationship
3. Is not affected as much by outliers
4. Quite easy to get completely opposite results with Spearman’s correlation
74. Correlation & Causation
Strength Major confounding factors may result in strong correlation
Consistency Assumes that causal factors are evenly distributed in population
Specificity No reason why a risk factor should be specific for a outcome
Temporality Directionality may not always imply causation e.g. Depression & Cancer
Biological Gradient Only true for events where there is a dose response gradient
Plausibility Depends on state of current scientific knowledge
Coherence Depends on quality of additional available information
Experimental Evidence Interventional research may not be always feasible
Analogy A subjective judgement
75. Correlation & Agreement
1. High correlation may not indicate agreement
e.g. 2 methods to measure height may be
correlated but give different measurements
2. A change in scale does not affect correlation
e.g. if one method measured height 2 x other
method correlation would still be strong
76. Linear Model
Y = a + βc
As you may remember the equation for a line.
The job of regression is to find a and β so that any value
of c can be used to predict Y
A statistical method to predict a variable is a model.
A Linear regression is a OLS fit
78. Linear Regression : Assumptions
1. The 2 assumptions for correlation hold true - linear relationship & absence of
outliers
2. In addition residuals should be normally distributed
3. Homoscedasticity should be present
4. Observations should be independent - no autocorrelation
5. Multi-collinearity should be absent
79. Homoskedasticity
1. Plot the predictor variable against the linear
regression line
2. If the variables are distributed in a manner that
they are equidistant along the line
3. Essentially means that predictor variables values
have the same variance across the values of the
predictor variable
4. Practically determined from residuals
80. Residuals
1. Nothing but the difference between the
observed value of the outcome variable
and the predicted value from the model.
2. In other words it is a measure of the error
/ disagreement for the model predictions.
3. Plot of residuals vs the predicted value
should give a nearly straight line if there is
homoskedasticity
81. Alternatives to Linear Regression
Logistic regression : If your outcome variable in binary categorical (e.g. death /
alive)
Ordinal regression : Ordinal categorical data
Poisson regression : If you have count data
If a non linear relationship exists then a non linear regression model - alternative
use transformation of the outcome variable or use segmented regression
82. What about survival ?
This is a special regression problem where the outcome is the time survived.
Both linear and nonlinear methods are available.
Parametric and nonparametric tests are available.
A key point : These methods are required ONLY if all potential events have not
occurred in the time frame of observation - or all patients have not died.
N.B. These methods are applicable to any time to event end points
83. Defining the Time
Needs a baseline date from which observation starts - ideally time when exposure
starts - possible to know very rarely
In case of RCTs - classically the date of randomization
In retrospective studies - date of registration / date of diagnosis
IF patient has event then the date / time of the event is noted else the date / time
of last FU is noted. - Note logically it should be larger than 0.
84. The Censoring Problem
The censoring problem arises as all events do not occur in the observation time
frame (i.e. patients remain alive )
We do not know for sure that the remaining sample is not at risk for having the
event afterwards.
In absence of censoring you get an artificially inflated survival figure.
Right censoring is when the subject does not have the event before the time
observation ends. Left when the patient has event prior to study time.
85. Hazard
The effect size estimator obtained from survival methods - can be considered as
the risk of developing the event.
Hazard rate is the instantaneous probability of the occurrence of the event. It
ignores the accumulation of hazard uptil that time point
Hazard ratio is the ratio of hazard rates in two groups
Cumulative Hazard is the integration of the Hazard rate over a given interval of
time.
86. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
87. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
88. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
89. The Kaplan Meier Estimate
Time Death
1 Yes
2 No
3 No
4 Yes
5 No
10 Yes
12 No
Interval Entered Deaths Censored Alive S Prob
0 - 1 7 1 0 6 6/7 86%
1 - 4 6 1 2 3 3/4* (3/4*6/7) 64%
4 -10 3 1 1 1 1/2* (1/2*3/4*6/7) = 31%
*censored individuals are removed from the denominator
92. Comparisons
The Kaplan Meier method can allow you to compare the survival among groups of
patients.
While the effect size is important and can be conceptualized as the risk ratio or
the hazard ratio we can test for the null hypothesis that the survival curves are
equal
The commonest is the Log Rank test
93. Log Rank Test
Calculates the observed number of deaths in each group at each time point where
there is a event and the number expected if there was no difference between the
groups.
E.g. 2 groups of 20 patients each & 1 death in 6 months - the expected number of
deaths in each group would be (1/40)*20 or 0.5 (note this is the number not %).
This process is repeated for all the time points where there is a event & total
number of observed and expected deaths in groups calculated - then a simple Chi -
Square test is used to determine if the difference is more than 0.
94. Alternatives
Since the log rank test gives equal weightage to all time points some alternatives
are available - e.g. Breslow which gives a weightage depending on the number of
cases at risk at each time point.
Breslow test is better when you have more deaths at the start of the KM curve
and misleading when you have more censoring --- best stick to the log rank
95. Assumptions for KM estimator
1. Patients who are censored have the same survival prospect as those who are
followed up
2. Survival for patients who present earlier is same as that of the patients
presenting later
However Kaplan Meier method is a nonparametric estimator which implies that
the estimate does not depend on the shape of the survival function.
96. The Cox Regression
1. Allows multivariable regression modelling for survival.
2. Unlike Kaplan meier allow continuous predictor variables
3. Is one of the most (ab)-used survival analysis techniques
4. Can be used to generate a predictive model
5. Ideal sample ? - 20 x predictors = Number of Events
102. Cox Regression : Assumptions
1. The proportional hazards assumption should be fulfilled - i.e. the hazard
function for the two strata should remain proportional.
2. Censoring should be non-informative i.e. censoring of one person should not
influence the outcome of another
3. There is a linear relationship between the log of the hazard and the
covariates
4. Overtly influential data (outliers) should not be present
There are diagnostic methods available for each of the above.
103. How to check for Proportional
hazards
1. If the predictor variable is categorical KM curves
can be generated and we can see if the lines
maintain the same separate.
2. Alternatively you can generate Schoenfeld
residuals in SPSS and plot these residuals against
the time for each covariate.
105. Cox Regression : Advantages
1. It is a semi-parametric model and is less affected by outliers.
2. Unlike parametric survival models does not require correct specification of
the underlying distribution
3. Lot of diagnostic procedures
However does not give baseline hazard which makes predictive modelling
difficult
106. What not do while modelling (regression)
1. Do not work with sample sizes that are clearly inadequate
2. Do not use univariate selection
3. Do not use stepwise forward / backward selection methods
4. Do not blindly assume linearity / proportional hazards - always understand
the underlying assumptions as well as the correct checks for the same
5. Read about residuals before jumping into regression
6. Don’t use split sample validation - instead use cross validation or
bootstrapping
DON’T FALL IN LOVE WITH YOUR MODEL
107. Resources
SAS Seminar: Introduction to Survival Analysis in SAS [Internet]. [cited 2016 Sep 9]. Available from:
http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
SPSS Library: Understanding contrasts [Internet]. [cited 2016 Sep 9]. Available from:
http://www.ats.ucla.edu/stat/spss/library/contrast.htm
Bian H. Survival Analysis Using SPSS. Available from:
http://core.ecu.edu/ofe/StatisticsResearch/Survival%20Analysis%20Using%20SPSS.pdf
Bland JM, Altman DG. The logrank test. BMJ. 2004 May 1;328(7447):1073.
Practical recommendations for statistical analysis and data presentation in Biochemia Medica journal | Biochemia Medica
[Internet]. [cited 2016 Sep 8]. Available from: http://www.biochemia-medica.com/2012/22/15
Manikandan S. Measures of dispersion. J Pharmacol Pharmacother. 2011 Oct;2(4):315–6.
Manikandan S. Measures of central tendency: Median and mode. J Pharmacol Pharmacother. 2011 Jul;2(3):214–5.
Utley M, Gallivan S, Young A, Cox N, Davies P, Dixey J, et al. Potential bias in Kaplan–Meier survival analysis applied to
rheumatology drug studies. Rheumatology. 2000 Jan 1;39(1):1–2.