Statistics is the science of dealing with numbers and is used for collecting, summarizing, presenting, and analyzing data. It plays important roles in health care planning and evaluation, epidemiological studies, diagnosing community health problems, and comparing diseases and health status. Data can be quantitative or qualitative, discrete or continuous. Data is commonly presented using tables and graphs like bar charts, pie charts, histograms, scatter plots, and line graphs. Key measures used to summarize data include the mean, median, and mode for measures of central tendency, and the range, variance, and standard deviation for measures of dispersion.
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
The present article helps the USA, the UK and the Australian students pursuing their business and marketing postgraduate degree to identify right topic in the area of marketing in business. These topics are researched in-depth at the University of Columbia, brandies, Coventry, Idaho, and many more. Stats work offers UK Dissertation stats work Topics Services in business. When you Order stats work Dissertation Services at Tutors India, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts.
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
Quantitative research is the systematic investigation that plays attention to numerical or statistical values in a bid to find answers to research questions.
Topic: Population And Sample
Student Name: Sidera Saleem
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
The present article helps the USA, the UK and the Australian students pursuing their business and marketing postgraduate degree to identify right topic in the area of marketing in business. These topics are researched in-depth at the University of Columbia, brandies, Coventry, Idaho, and many more. Stats work offers UK Dissertation stats work Topics Services in business. When you Order stats work Dissertation Services at Tutors India, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts.
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
Quantitative research is the systematic investigation that plays attention to numerical or statistical values in a bid to find answers to research questions.
Topic: Population And Sample
Student Name: Sidera Saleem
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
The two major areas of statistics are: descriptive statistics and inferential statistics. In this presentation, the difference between the two are shown including examples.
Statistics is the science of dealing with numbers.
It is used for collection, summarization, presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).
The two major areas of statistics are: descriptive statistics and inferential statistics. In this presentation, the difference between the two are shown including examples.
Statistics is the science of dealing with numbers.
It is used for collection, summarization, presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).
4. Performed statistical analysis on a chosen data table and understood relationship amongst different data fields using IBM SPSS software.
Methodologies: Multi linear regression, Logistic linear regression
IBM SPSS
Running head Final Project Data Analysis1Final Project Data A.docxjeanettehully
Running head: Final Project Data Analysis 1
Final Project Data Analysis 2
Final Project Data Analysis:
Luz Rodriguez
Southern New Hampshire University
Process and calculations
In completing the research on the influence that gender (male/female) has over the length of the hospital stay. We can use several types of statistical tests in analysis a more accurate analysis of the research question. This involves a dot plot and a histogram. In responding to this question, we can place gender in one category but studying it under two separate samples, male and female and the effects of length of stay after a myocardial infarction. We can compute this by resolving quantitative data and the relationship between the two factors s dot plot and a histogram would be effective in achieving this analysis.
Research question
To what extent does gender influence length of hospital stay for MI patients?
Response and predictor variables
Response: Length of hospital stay (LOS)
-Predictor: Gender (female and male)
Type of variable for predictor variable
Predictor: gender (female or male)
Type of diagram for analysis
Dot plot
Histogram
Data analysis
As shown the data tries to compare the differences between gender (male and Female) and the length of stay in hospitals with respect to each other. It’s clear that the length of hospital stay which is represented by 0 is shorter as compared to that of the female which is represented by 1. If there is a larger differences between the two genders, then there is a meaning which would reduce the standard deviation (Gerstman, 2015).
gender
n
mean
variance
Std. dev
Std. err.
median
range
min
max
Q1
Q3
0
65
0
0
0
0
0
0
0
0
0
0
1
35
1
0
0
0
1
0
1
1
1
1
Hypothesis test results:
Difference
Sample Diff.
Std. Err.
DF
T-Stat
P-value
μ1 - μ2
6.49
0.59375453
198
10.930443
<0.0001
References
Gerstman, B. B. (2015). Basic Biostatistics Statistics for Public Health (2nd ed.). Burlington, MA: Jones & Bartlett Learning.
gender 1 0.0 10.0 20.0 30.0 40.0 50.0 60.0 30.0 4.0 0.0 0.0 0.0 0.0 1.0
gender 0 2.0 17.0 34.0 3.0 6.0 1.0 3.0 gender 1 30.0 4.0 0.0 0.0 0.0 0.0 1.0
gender 0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 2.0 17.0 34.0 3.0 6.0 1.0 3.0
Course ProjectCriteriaPointsDescribes the patient that is the subject of the project including diagnoses, medications, and history OR describes the community, its strengths and problems and the mental health issue that will be the subject of the paper.4Includes any substance abuse or violence issues for the patient or community 2Discusses attempted interventions, what has been successful and what has not.4Describes own personal thoughts about the patient's or community's mental health issues. 4Describes any cognitive concerns and possible interventions.2Writes a nursing care plan including three priority nursing diagnoses with r/t and AEB factors.4Includes outcomes in Nursing Outcomes Classification language and interventions in Nursing Intervention Classificati ...
Running head Final Project Data Analysis1Final Project Data A.docxwlynn1
Running head: Final Project Data Analysis 1
Final Project Data Analysis 2
Final Project Data Analysis:
Luz Rodriguez
Southern New Hampshire University
Process and calculations
In completing the research on the influence that gender (male/female) has over the length of the hospital stay. We can use several types of statistical tests in analysis a more accurate analysis of the research question. This involves a dot plot and a histogram. In responding to this question, we can place gender in one category but studying it under two separate samples, male and female and the effects of length of stay after a myocardial infarction. We can compute this by resolving quantitative data and the relationship between the two factors s dot plot and a histogram would be effective in achieving this analysis.
Research question
To what extent does gender influence length of hospital stay for MI patients?
Response and predictor variables
Response: Length of hospital stay (LOS)
-Predictor: Gender (female and male)
Type of variable for predictor variable
Predictor: gender (female or male)
Type of diagram for analysis
Dot plot
Histogram
Data analysis
As shown the data tries to compare the differences between gender (male and Female) and the length of stay in hospitals with respect to each other. It’s clear that the length of hospital stay which is represented by 0 is shorter as compared to that of the female which is represented by 1. If there is a larger differences between the two genders, then there is a meaning which would reduce the standard deviation (Gerstman, 2015).
gender
n
mean
variance
Std. dev
Std. err.
median
range
min
max
Q1
Q3
0
65
0
0
0
0
0
0
0
0
0
0
1
35
1
0
0
0
1
0
1
1
1
1
Hypothesis test results:
Difference
Sample Diff.
Std. Err.
DF
T-Stat
P-value
μ1 - μ2
6.49
0.59375453
198
10.930443
<0.0001
References
Gerstman, B. B. (2015). Basic Biostatistics Statistics for Public Health (2nd ed.). Burlington, MA: Jones & Bartlett Learning.
gender 1 0.0 10.0 20.0 30.0 40.0 50.0 60.0 30.0 4.0 0.0 0.0 0.0 0.0 1.0
gender 0 2.0 17.0 34.0 3.0 6.0 1.0 3.0 gender 1 30.0 4.0 0.0 0.0 0.0 0.0 1.0
gender 0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 2.0 17.0 34.0 3.0 6.0 1.0 3.0
Course ProjectCriteriaPointsDescribes the patient that is the subject of the project including diagnoses, medications, and history OR describes the community, its strengths and problems and the mental health issue that will be the subject of the paper.4Includes any substance abuse or violence issues for the patient or community 2Discusses attempted interventions, what has been successful and what has not.4Describes own personal thoughts about the patient's or community's mental health issues. 4Describes any cognitive concerns and possible interventions.2Writes a nursing care plan including three priority nursing diagnoses with r/t and AEB factors.4Includes outcomes in Nursing Outcomes Classification language and interventions in Nursing Intervention Classificati.
Statistics For Data Analytics - Multiple & logistic regression Shrikant Samarth
Task: To build multiple regression and logistic regression models on appropriate data.
Approach: A general topic was selected first after which the data was downloaded from the source keeping the restrictions in mind and then cleaned in R. Then the multiple regression and logistic regression models were built using IBM SPSS and the outputs were interpreted. The dependent variable was life expectancy and the independent variables were Age-standardized Mortality-Communicable”, “Age-standardized Mortality-Cardiovascular Disease and Diabetes".
Findings: Multipleregression - analysis was conducted to make sure normality, linearity, multi-collinearity, independence of errors and homoscedasticity were not violated. Statistically, the score of Life expectancy at age 60, 퐹(2,102) = 39.474 푅2 = .436, 푝 < 0.0005
Logistic Regression: Result shows 58.9% (Cox & Snell R-Square) and 80.1% (Nagelkerke R-Square) of the variance and gives 92.4% of correctly classified countries. The two indicating factors made a remarkable commitment to the model. Also, the model predicts the increase in “Mortality-Cardiovascular/Diabetes” and “Mortality rate cause by Communicable” variables is the cause of a decrease in Life Expectancy in a country.
Tools: IBM SPSS
A prospective study was conducted at a critical care department and post-anesthesia care unit of a university teaching hospital in Barcelona, Spain. The study recruited 707 patients with invasive BP and finger PPG waves over a period of 26 months. Exclusion criteria were presence of major arrhythmia, immediate death condition and disturbances in the arterial or PPG curve morphology. For each patient we automatically recorded the systolic blood pressure (SBP), mean arterial pressure (MAP), diastolic blood pressure (DBP) and PPG curve for 30 minutes. The PPG signal was further processed to obtain a set of features that were used to construct a Deep Belief Network with Gaussian Restricted Boltzmann Machine (DBN-RBM). The available dataset was split into three subsets (Training, Validation and Testing). The training and validation datasets included 85% of data and the testing dataset included 15% of the available data. The regression error was assessed through a Bland-Altman analysis and the AAMI standard. The mean prediction error were -2.98+-19.35 mmHg for SBP, -3.38+-10.35 mmHg for MAP and 3.65+-8.69 mmHg for DBP.
The results obtained are promising for the assessment of MAP and DBP with DBN-RBM. Further research and clinical validation are needed to bring this technology to standard medical practice.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Definition:
Statistics is the science of dealing with numbers. It
is used for collection, summarization, presentation,
and analysis of data.
Uses:
Planning & evaluation of health care programs.
Play a role in epidemiological studies.
Diagnosis of community health problems.
Comparison of diseases and health status.
Forming standards for biologic measurements e.g.
BP.
Differentiation between diseased and normal
groups.
4. Data : Observations made on
individuals.
Variable : any aspect of
individual that is measured e.g.
blood pressure, age.
5.
6. Confounding variable: are two variables (explanatory
variables) that are confounded when their effects on a
response variable cannot be distinguished from each other
11. I. Tabulation:I. Tabulation: criteriaتتتت
تتتتتت
Self explanatory.Self explanatory.
Title at the top.Title at the top.
Clear headings of columns and rows.Clear headings of columns and rows.
Clear units of measurements.Clear units of measurements.
Number of classes or rows from 2-10.Number of classes or rows from 2-10.
2 types :2 types : Listing tables.Listing tables.
Frequency distribution table.Frequency distribution table.
12. No. of patients in each department at Zagazig hospitalNo. of patients in each department at Zagazig hospital
Department No. of patientsNo. of patients
Medicine
Surgery
ENT
Ophthalmology
100100
8080
4040
3030
Total 250250
(1) Listing table(1) Listing table
13. Distribution of students at public health lab 1
according to gender
Gender No. of studentsNo. of students
Male
Female
3535
2020
Total 5555
e.g. Listing tablee.g. Listing table
14. (2) Frequency distribution table for(2) Frequency distribution table for qualitativequalitative
data:data:
20 individuals of blood group: A- AB- AB-O-B-A-20 individuals of blood group: A- AB- AB-O-B-A-
A-B-B-AB-O-AB-AB-A-B-B-B-A-O-A.A-B-B-AB-O-AB-AB-A-B-B-B-A-O-A.
Distribution of the studied individuals according
to their blood group.
Blood group FrequencyFrequency %%
A
B
AB
O
66
66
55
33
3030
3030
2525
1515
TotalTotal 2020 100.00100.00
15. (3) Frequency Distribution table for(3) Frequency Distribution table for
quantitative data example:example:
Blood Pressure ofBlood Pressure of 30 patients with30 patients with
hypertension are:hypertension are: 150-155-160-154-162-170--155-160-154-162-170-
165-155-190-186-180-178-195-165-155-190-186-180-178-195-200-180-165--180-165-
173-188-173-189-190-175-186-174-155-164-173-188-173-189-190-175-186-174-155-164-
163-172-159-177.163-172-159-177.
Present these data in a frequency table?Present these data in a frequency table?
16. 1.1. Title:Title:
2.2. Table: 3 columns :Table: 3 columns : 11stst
: blood pressure: blood pressure
22ndnd
: Frequency.: Frequency.
33rdrd
: Percentage.: Percentage.
3.3. First column: classify blood pressure into classes.First column: classify blood pressure into classes.
4.4. Choose a class interval: 10.Choose a class interval: 10.
5.5. No. of classes=50 (largest value-lowestNo. of classes=50 (largest value-lowest
value)/10=5.value)/10=5.
6.6. Choose uper & lower limit of the class interval.Choose uper & lower limit of the class interval.
7.7. Each observation allocated to its class interval.Each observation allocated to its class interval.
8.8. Percentage of each class is calculated.Percentage of each class is calculated.
19. II- Graphical Presentation
DefinitionDefinition::
Presenting data by using diagrams.Presenting data by using diagrams.
Graph should be :
Simple, understood.Simple, understood.
Save a lot of words.Save a lot of words.
Self explanatory.Self explanatory.
Clear title.Clear title.
Fully labeled.Fully labeled.
Vertical axis used for frequency.Vertical axis used for frequency.
20. Bar chart
Used forUsed for discrete oror qualitativequalitative data.data.
Data presented by rectangles separated byData presented by rectangles separated by
gaps,gaps, the length is proportional to the
frequency..
Types of Bar charts:Types of Bar charts:
Simple.Simple.
Multiple.Multiple.
ComponentComponent..
21. Simple bar chartSimple bar chart
Blood gp.Blood gp. Freq.Freq.
AA
BB
ABAB
OO
44
88
55
33
TotalTotal 2020
4
8
5
3
0
1
2
3
4
5
6
7
8
9
A B AB O
Blood Group
Frequency
23. Multiple bar chartMultiple bar chart
Blood gp.Blood gp. Freq.Freq.
Female MaleFemale Male
AA
BB
ABAB
OO
33
66
77
44
44
88
55
33
TotalTotal 2020 2020
What is the defect in this char???????????
24. SE ClassSE Class %%
Egypt USAEgypt USA
LowLow
MiddleMiddle
HighHigh
6060
3030
1010
1010
6060
3030
TotalTotal 100100 100100
Component Bar Chart
FrequencyFrequency
%%
SE ClassSE ClassEgyptEgypt USAUSA
2020
4040
6060
8080
100100
What is the defect in this char???????????
25. Pie Chart
Circle represent the total frequency
100%.
Used in discrete or qualitative data.
Divided into segments according to the
proportion of each category.
2 pies can be used for comparison.
28. Histogram ::
Used forUsed for quantitative continuous data.quantitative continuous data.
Each class interval represented byEach class interval represented by
rectangle.rectangle.
The height ofThe height of rectangle represent therepresent the
frequency.frequency.
Rectangles areRectangles are adherent.
30. Frequency PolygonFrequency Polygon::
Derived fromDerived from histogram..
The midpoint of the rectangles’The midpoint of the rectangles’
top are connected.top are connected.
It can be drawn withoutIt can be drawn without
histogram.histogram.
33. Scatter Diagram
Used to represent the relationshipUsed to represent the relationship
betweenbetween 2 quantitative continuous
measurements.measurements.
Each observation is represented by a pointEach observation is represented by a point
corresponding to its value on each axis.corresponding to its value on each axis.
34. 1.1. If the points scatterIf the points scatter
upward directionupward direction +ve
correlation.
2.2. If the point scatterIf the point scatter
downwarddownward direction
–ve correlation.
3.3. If the points scatterIf the points scatter
horizontallyhorizontally nono
correlation.correlation.
35. Line Graph
Represent the relationship between 2Represent the relationship between 2
numeric variables.numeric variables.
The points joined together to from aThe points joined together to from a
line.line.
Ex: Relation between temperature & time.Ex: Relation between temperature & time.
Relation between height & weight.Relation between height & weight.
Line graphs can be used for more thanLine graphs can be used for more than
one group.one group.
38. Graphical Presentation
Qualitative & discrete data:Qualitative & discrete data: * Bar Chart* Bar Chart
* Pie chart* Pie chart
Quantitative continuous data:Quantitative continuous data:
Histogram (e.g. population pyramid).Histogram (e.g. population pyramid).
Frequency polygon (e.g. normal distribution curve)Frequency polygon (e.g. normal distribution curve)
Relation between 2 numerical variables:Relation between 2 numerical variables:
Scatter diagram.Scatter diagram.
Line graph.Line graph.
Remember
39.
40. While preparing the report of gastroenteritis
outbreak investigation the researcher wanted to
present the data i.e. number of cases related to
time, graphically. Which graph would you
suggest?
a) Bar chart
b) Pictogram
c) Pie chart
d) Histogram
e) Scatter diagram
43. Data SummarizationData Summarization
Measures ofMeasures of
central tendencycentral tendency
Measures ofMeasures of
dispersiondispersion
Arithmetic meanArithmetic mean ..
MedianMedian ..
ModeMode ..
RangeRange
Variance.Variance.
Standard deviation.Standard deviation.
Coefficient ofCoefficient of
variation.variation.
44. I- Measures of central tendencyI- Measures of central tendency
Describe the center of data:Describe the center of data:
X = meanX = mean = sum= sum
X = value of observations.X = value of observations.
n= number of observations.n= number of observations.
1.1. Ungrouped data: 12, 15, 10, 17, 13.Ungrouped data: 12, 15, 10, 17, 13.
= 12+15+10+17+13/5 = 13.4= 12+15+10+17+13/5 = 13.4
n
X
X
∑=
n
X
X
∑=
n
X
X
∑=
n
X
X
∑=
45. 2. Grouped data without class interval:2. Grouped data without class interval:
Where f = frequency of each XWhere f = frequency of each X
n
X
X
∑=
n
X
X
∑=
n
fX
X
∑=
IP (days)(x)IP (days)(x) Freq. (f)Freq. (f) FxFx
22
33
44
55
66
22
44
11
33
22
44
1212
44
1515
1212
TT 12 (n)12 (n) 47 (47 (fx)fx)
X IP = 74/12 = 3.9 days.X IP = 74/12 = 3.9 days.
46. 3. Frequency data with class interval:3. Frequency data with class interval:
X1 = midpoint of class interval.X1 = midpoint of class interval.
n
X
X
∑=
n
X
X
∑=
Bl. PressureBl. Pressure
mmHg (x)mmHg (x)
Freq. (f)Freq. (f) Midpoint (xMidpoint (x11)) FxFx11
150-150-
160-160-
170-170-
180-180-
190-190-
200-210200-210
66
66
88
66
33
11
155155
165165
175175
185185
195195
205205
930930
990990
14001400
11101110
585585
205205
TT 30 (n)30 (n) 5220 (5220 (fxfx11))
* Mean blood pressure = 5220/30= 174 mmHg.* Mean blood pressure = 5220/30= 174 mmHg.
n
fX
X
∑=
1
47. (2) Median :(2) Median :
Median is the middle observation in a series ofMedian is the middle observation in a series of
observations after arranging them in an assending orobservations after arranging them in an assending or
dessending manner.dessending manner.
1. If no. of observation is odds:1. If no. of observation is odds:
A set of data 5,6,8,9,11A set of data 5,6,8,9,11 n=5n=5
Median rank = n +1/2 = 5+1/2 = 3Median rank = n +1/2 = 5+1/2 = 3
Median is the third value (8).Median is the third value (8).
2. If no. of observations is even:2. If no. of observations is even:
A set of data 5,6,8,9A set of data 5,6,8,9 n=4n=4
Median rank = 4+1/2= 5/2= 2.5.Median rank = 4+1/2= 5/2= 2.5.
Median is the average of second & third value =Median is the average of second & third value =
6+8/2= 14/2= 7.6+8/2= 14/2= 7.
48. Mode :Mode :
The most frequent value.The most frequent value.
Example:Example:
5,6,7,5,105,6,7,5,10 mode = 5mode = 5
20,18,14,20,13,14,3020,18,14,20,13,14,30 mode= 14,20mode= 14,20
20,18,20,14,20,13,1420,18,20,14,20,13,14 mode = 20mode = 20
300,280,130,125,24300,280,130,125,24 No modeNo mode
49. II- Measures of dispersion:II- Measures of dispersion:
Describe the degree of variation of dataDescribe the degree of variation of data
around the central values:around the central values:
1. Range = largest observation – smallest observation.1. Range = largest observation – smallest observation.
(mean-x)(mean-x)22
2. Variance (V) = ----------------------2. Variance (V) = ----------------------
n – 1n – 1
n
X
X
∑=
n
X
X
∑=
50. 3. Standard deviation (SD):3. Standard deviation (SD):
(X-X)(X-X)22
SD = V = -------------SD = V = -------------
n-1n-1
4. Coefficient of variation (CV)4. Coefficient of variation (CV)
The percentage of SD from the meanThe percentage of SD from the mean
CV = SD/mean x 100CV = SD/mean x 100
n
X
X
∑=
n
X
X
∑=
51. ExampleExample
1. Set of observation 5, 7, 10, 12, 161. Set of observation 5, 7, 10, 12, 16
X = 5+7+10+12+16/5 = 50/5 = 10X = 5+7+10+12+16/5 = 50/5 = 10
(10-5)(10-5)22
++ (10-7)(10-7)22
+(10-10)+(10-10)22
+(10-12)+(10-12)22
+(10-16)+(10-16)22
7474
SD= -------------------------------------------------------- = ------- = 4.3SD= -------------------------------------------------------- = ------- = 4.3
5 – 15 – 1 44
CV = 4.3/10 x 100 = 43%CV = 4.3/10 x 100 = 43%
2. Set of observations 2, 2, 5,10, 112. Set of observations 2, 2, 5,10, 11
X = 2+2+5+10+11/5 = 30/5 = 6X = 2+2+5+10+11/5 = 30/5 = 6
(6-2)(6-2)22
+(6-2)+(6-2)22
+(6-5)+(6-5)22
+(6-10)+(6-10)22
+(6-11)+(6-11)22
7474
SD= -------------------------------------------------------- = ------- = 4.3SD= -------------------------------------------------------- = ------- = 4.3
5 – 15 – 1 44
CV = 4.3/6 x 100 = 71.6%CV = 4.3/6 x 100 = 71.6%
54. Normal Distribution CurveNormal Distribution Curve
(Guassian Curve)(Guassian Curve)
A frequency polygon used in presentationA frequency polygon used in presentation
continuous quantitative variables as age,continuous quantitative variables as age,
weight, height, Hb level, bl. pressure.weight, height, Hb level, bl. pressure.
Normal distribution curve is used to identifyNormal distribution curve is used to identify
normal & abnormal measurements.normal & abnormal measurements.
55. Characteristics of the CurveCharacteristics of the Curve
Bell-shaped, continuous.Bell-shaped, continuous.
Symmetrical.Symmetrical.
The tail extend to infinity.The tail extend to infinity.
Mean, mode, median coincide.Mean, mode, median coincide.
Described by:Described by: - arithmatic means ( )- arithmatic means ( )
- standard deviation (SD)- standard deviation (SD)
Area under the normal curve:Area under the normal curve:
± 1 SD = 68%± 1 SD = 68%
± 2 SD = 95%± 2 SD = 95% the normal rangethe normal range
± 3 SD = 99%± 3 SD = 99%
X
X
X
X
59. Example:Example:
In normal distribution curve for blood HbIn normal distribution curve for blood Hb
level for normal adult ♂:level for normal adult ♂:
Mean = 11Mean = 11 SD= ± 1.5SD= ± 1.5
Hb of an individual is 8.1 is he normal orHb of an individual is 8.1 is he normal or
anaemic?anaemic?
The higher level of Hb = 11+2 x 1.5 = 14The higher level of Hb = 11+2 x 1.5 = 14
The lower level of Hb = 11- 2 x 1.5 = 8The lower level of Hb = 11- 2 x 1.5 = 8
The normal range of Hb in adult ♂ is 8-14The normal range of Hb in adult ♂ is 8-14
Our patient (8.1) is normal.Our patient (8.1) is normal.
62. N.B.N.B. Research ProcessResearch Process
Research question
Hypothesis
Identify research design
Data collection
Presentation of data
Data analysis
Interpretation of data
63. What is a Statistic????
Population
Sample
Sample
Sample
Sample
Parameter: value that describes a population
Statistic: a value that describes a sample
always using samples!!!
64. Statistics
Descriptive Statistics
• Organize
• Summarize
• Simplify
• Presentation of data
Inferential Statistics
•Generalize from
samples to pops
•Hypothesis testing
•Relationships among
variables
Describing data
Make predictionsMake predictions
68. Inference:Inference: making a generalization about amaking a generalization about a
larger group of population on the basis of alarger group of population on the basis of a
sample.sample.
Inferential statistics Instead of using the
entire population to gather the data, the
statistician will collect a sample or samples
from the millions of residents and make
inferences about the entire population using
the sample.
69. Hypothesis (significance) testing:Hypothesis (significance) testing:
Conducting of significance test to find outConducting of significance test to find out
whether the observed variation among sampling iswhether the observed variation among sampling is
due todue to chance or it is a really difference.chance or it is a really difference.
70. General principles (steps) of
significance tests
Set up the null hypothesis & its alternative.Set up the null hypothesis & its alternative.
Set level of significance:Set level of significance:
In medicine, we consider the difference are significantIn medicine, we consider the difference are significant
if the probability (P value) is less thanif the probability (P value) is less than 0.05.
Find theFind the value of the test statistics (calculatedvalue of the test statistics (calculated
value)value)..
71. General principles (steps) of
significance tests
Find the tabulated value.Find the tabulated value.
Conclude that the data are consistent orConclude that the data are consistent or
inconsistent with theinconsistent with the null hypothesis byby
comparing the two values. If data are notcomparing the two values. If data are not
consistent with null hypothesis we rejectconsistent with null hypothesis we reject
it & the difference isit & the difference is statistically
significant & the vice versa.& the vice versa.
72.
73. Null & alternative hypothesis
For quantitative data
In null hypothesis (H0): X1=X2 or X1-X2=0.
Alternative hypothesis (H1) is postulated
(Research hypothesis).
H1 : X1<X2 or H1: X2<X1. or X1 ≠ X2
or X1-X2 ≠ 0
74. N.B. Statistics demonstrate association, but not
causation
H0: There is no association between the
exposure and disease of interest
H1: There is an association between the
exposure and disease of interest
74
Hypothesis Testing
For qualitative data
75. Chain of Reasoning for
Inferential Statistics
Population
Sample
Inference
Selection
Measure
Probability
data
Are our inferences valid?…Best we can do is to calculate probability
about inferences
76. Inferential Statistics: uses sample data to evaluate the
credibility of a hypothesis about a population
NULL Hypothesis:
NULL (nullus - latin): “not any” no
differences between means
H0 : m1 = m2
“H- Naught”Always testing the null hypothesis
77. Inferential statistics: uses sample data to evaluate the
credibility of a hypothesis about a population
Hypothesis: Scientific or alternative
hypothesis
Predicts that there are differences between the groups
H1 : m1 = m2
78. Hypothesis
A statement about what findings are expected
null hypothesis
"the two groups will not differ“
alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same"
79. Inferential Statistics
When making comparisons
btw 2 sample means there are 2
possibilities
Null hypothesis is true
Null hypothesis is false
Not reject the Null Hypothesis
Reject the Null hypothesis
Statistical significanceNo Statistical significance
80. D+
D-
E+
15 85
E-
10 90
Example:
IE+ = 15 / (15 + 85) = 0.15
IE- = 10 / (10 + 90) = 0.10
RR = IE+/IE- = 1.5, p value = 0.30
Although it appears that the incidence of disease may be
higher in the exposed than in the non-exposed (RR=1.5),
the p-value of 0.30 exceeds the fixed alpha level of 0.05.
This means that the observed data are relatively
compatible with the null hypothesis. Thus, we do not
reject H0 in favor of H1 (alternative hypothesis).
81. 2.5% 2.5%
5% region of rejection of null hypothesis
Non directional
Two Tail
82. 5%
5% region of rejection of null hypothesis
Directional
One Tail
83. N.B.N.B. In medicineIn medicine
We consider that differences are significant
if the probability (p value) is less than 0.05
this means that:
if the null hypothesis is true, we will make aif the null hypothesis is true, we will make a
wrong decision less than 5 in a hundredwrong decision less than 5 in a hundred
times.times.
84. Hypothesis Testing Flow ChartHypothesis Testing Flow Chart
Develop research hypothesis H1 & null hypothesis H0
Set significance level (usually .05(
Collect data
Calculate test statistic and p value
Compare p value to
alpha (.05(
P < .05 P > .05
Reject null hypothesis Fail to reject null hypothesis
Statistical significance No Statistical significance
87. ((A) Quantitative dataA) Quantitative data
1.1. Compare 2 means of large sample (≥60) & followCompare 2 means of large sample (≥60) & follow
normal distributionnormal distribution
Z testZ test (SND)(SND) ==
(population mean – sample mean)/SD(population mean – sample mean)/SD
88. If the result of Z >2 then there is significant difference.
As we mentioned before the normal range for any
biological reading lies between the mean value of the
population reading ± 2 SD.
(this range includes 95% of the area under the normal
distribution curve).
89. 2. Compare 2 means of small sample (<60)2. Compare 2 means of small sample (<60)
tt test =test = df=ndf=n11+n+n22 -2-2
The value ofThe value of tt is compared to the values inis compared to the values in
t-tablet-table at the value of degree of freedom.at the value of degree of freedom.
2
2
2
1
2
1
21
n
SD
n
SD
xx
+
−
90. TheThe value of tvalue of t will be compared to values in thewill be compared to values in the
specific table ofspecific table of "t distribution test""t distribution test" at theat the
value of the degree of freedom.value of the degree of freedom.
If the value ofIf the value of tt isis less thanless than that in the table,that in the table,
then the difference between samples isthen the difference between samples is
insignificant.insignificant.
If theIf the t valuet value isis larger thanlarger than that in the table sothat in the table so
the difference is significant i.e.the difference is significant i.e. the nullthe null
hypothesis is rejected (significant).hypothesis is rejected (significant).
91. Serum cholesterol levels for two groups of EgyptiansSerum cholesterol levels for two groups of Egyptians
were recorded. The mean cholesterol levels of thewere recorded. The mean cholesterol levels of the
two groups were compared. To determine whethertwo groups were compared. To determine whether
the measurements were significantly different or not,the measurements were significantly different or not,
the most appropriate statistical test would be:the most appropriate statistical test would be:
a. Chi-square testa. Chi-square test
b. Correlation analysisb. Correlation analysis
c. F test (ANOVA)c. F test (ANOVA)
d. Student’s t testd. Student’s t test
e. Regression analysise. Regression analysis
92. In a study carried out to assess the hemoglobin level of two groupsIn a study carried out to assess the hemoglobin level of two groups
of students, one group of them was suffering from parasiticof students, one group of them was suffering from parasitic
infestation.infestation.
The following was found out:The following was found out:
Group1
Healthy
)Hb level(
Group2 parasitic
infestation
)Hb level(
12 10
13 9
16 12
13 11
15 8
16 10.5
15 11
14 9.5
14 13
11 11
Is there a statistical significant
difference between the two
groups?
)P value < 0.05 if test results
> 2.11 (
Tabulated value
96. 3-Paired t test:3-Paired t test:
Compare Means of twoCompare Means of two matched samplesmatched samples oror
means of repeated observation in the samemeans of repeated observation in the same
individualindividual )Pre & post()Pre & post(..
Paired t-test =the mean difference divided byPaired t-test =the mean difference divided by
)standard deviation difference between each pair ∕)standard deviation difference between each pair ∕
√√n(n(
97. Six volunteers took a cholesterol lowering diet for 3Six volunteers took a cholesterol lowering diet for 3
months and mean cholesterol levels were measuredmonths and mean cholesterol levels were measured
beforebefore andand afterafter the trial diet. The appropriate test ofthe trial diet. The appropriate test of
statistical significance for this trial will be:statistical significance for this trial will be:
a) Chi-square testa) Chi-square test
b) Odd’s ratiob) Odd’s ratio
c) Paired t- testc) Paired t- test
d) Student t-testd) Student t-test
e) Z tesTe) Z tesT
98. 4-Analysis of variance )ANOVA = F test(:
Comparing several means:
D-F = (d.f between groups, df within groups)D-F = (d.f between groups, df within groups)
= K – 1, N – K= K – 1, N – K
Mean square difference between groups
F= Mean square difference within groups
99. A-One way analysis of variance:A-One way analysis of variance: It is used toIt is used to
compare means of more than 2 groups by a definedcompare means of more than 2 groups by a defined
one factorone factor e.g.e.g. )BG in 3 groups of pts: 1-lifestyle,)BG in 3 groups of pts: 1-lifestyle,
2-OHA, 3-Insulin therapy(2-OHA, 3-Insulin therapy(
100. e.g. Comparing mean blood glucose levels amonge.g. Comparing mean blood glucose levels among
the studied groups of T2diabetic patientsthe studied groups of T2diabetic patients
Variable Life style
group
)diet +exercise(
Mean +SD
Oral
hypoglycemic
drugs
Mean +SD
Insulin
therapy
group
Mean +SD
ANOVA
&
P value
Random
Blood
glucose
(mg/dl)
135+45.5 127+42.5 118.5+25.5
101. B- Two – way analysis of variance:B- Two – way analysis of variance: is used tois used to
compare the means of more than 2 groups bycompare the means of more than 2 groups by
more than one factormore than one factor e.g.e.g. )BG & cholesterol)BG & cholesterol
level in 3 groups of pts: 1-lifestyle, 2-OHA,level in 3 groups of pts: 1-lifestyle, 2-OHA,
3-Insulin therapy(3-Insulin therapy(
102. e.g. Comparing mean blood glucose &e.g. Comparing mean blood glucose &
cholesterol levels among the studied groups ofcholesterol levels among the studied groups of
T2diabetic patientsT2diabetic patients
Variable Life style
group
)diet
+exercise(
Mean
+SD
Oral
hypoglyce
mic drugs
Mean
+SD
Insulin
therapy
group
Mean +SD
ANOVA
&
P value
Random
Blood
glucose
(mg/dl)
135+45.5 127+42.5 118.5+25.5
Cholester
ol level
180 + 67 179 + 77.5 174 + 66.4
103.
104. )B( Qualitative Variables
1. Chi = square test (x1. Chi = square test (x22
):):
== df= (row-1)(column-1)df= (row-1)(column-1)
O = observed valueO = observed value
E= expected value =E= expected value =
==
∑
−
E
EO 2
)(
totalgrand
totalcolumnxtotalrow
2
χ
105. Association between physical activity andAssociation between physical activity and
weightweight
Obese-
overwt
Average wt Total
Lack of
activity
70 (E1) 30 (E2) 100
Physical
activity
10 (E3) 90 (E4) 100
Total 80 120 200
N.B. Chi-square value at DF=1 equal 3.8
106. XX22
==
)70-40()70-40(22
∕40∕40 ++ )30-60()30-60(22
∕60∕60++)10-40()10-40(22
∕40∕40 ++ )90-60()90-60(22
∕60=∕60=
22.5 + 15 + 22.5 +15=22.5 + 15 + 22.5 +15= 7575
calculated value > tabulated valuecalculated value > tabulated value
p=0.0000p=0.0000
Obese-
overwt
Average wt Total
Lack of
activity
70 (40) 30 (60) 100
Physical
activity
10 (40) 90 (60) 100
Total 80 120 200
107. Example:Example:
The result of influenza vaccine trial.The result of influenza vaccine trial.
InfluenzaInfluenza
VaccineVaccine
O EO E
PlaceboPlacebo
O EO E
TT
YesYes
NoNo
6060
4040
4040
6060
100100
100100
100100 100100 200200
Expected value in every cell =Expected value in every cell =
R total x C totalR total x C total
= --------------------------= --------------------------
G totalG total
111. (2) Z- test(2) Z- test to compare 2 proportions:to compare 2 proportions:
ZZ ==
PP11= % of first group.= % of first group.
PP22=% of second group.=% of second group.
qq11= 100-p= 100-p1.1.
qq22=100-p=100-p2.2.
nn11=size of first group.=size of first group.
nn22=size of second group.=size of second group.
IfIf Z>2Z>2, the difference is statistically significance., the difference is statistically significance.
2
22
1
11
21
n
qp
n
qp
PP
+
−
112. Example:Example:
No of anaemic patients in group 1(50) is 5.No of anaemic patients in group 1(50) is 5.
No of anaemic patients in group 2(60) is 20.No of anaemic patients in group 2(60) is 20.
Find if gp 1 & 2 are statistically different inFind if gp 1 & 2 are statistically different in
the prevalence of anaemia.the prevalence of anaemia.
We use Z test:We use Z test:
PP11= 5/50 x 100= 10%.= 5/50 x 100= 10%. PP22=20/60 x 100 = 33%.=20/60 x 100 = 33%.
qq11= 100-10= 90% .= 100-10= 90% . qq22=100-33= 67.=100-33= 67.
nn11=50.=50. nn22=60.=60.
113. Z =Z =
Z = 3.1 > 2 so, there is statisticallyZ = 3.1 > 2 so, there is statistically
significant difference between thesignificant difference between the
precentages of anaemia between the 2precentages of anaemia between the 2
groups.groups.
1.34.7/23
85.3618
23
60
6733
50
9010
3310
==
+
=
+
−
xx
114. Correlation & Regression
Correlation: measure the degree of associationmeasure the degree of association
between 2 continuous variables.between 2 continuous variables.
Correlation is measured byCorrelation is measured by correlationcorrelation
coefficientcoefficient (r)(r)..
Value of r ranged betweenValue of r ranged between +1 & -1.+1 & -1.
r=0 means no correlation.r=0 means no correlation.
r=+1 means perfect +ve association.r=+1 means perfect +ve association.
r=-1 means perfect -ve association.r=-1 means perfect -ve association.
t-testt-test for correlation is used to test thefor correlation is used to test the
significance of association.significance of association.
116. Scatter PlotsScatter Plots
Strong Negative Correlation
X
Y
r = -0.86
Strong Positive Correlation
X
Y
r = 0.91
Positive Correlation
X
Y
r = 0.70
No Correlation
X
Y
r = 0.06
117. Variable Pearson
correlation
)r(
P value
MCV, fl 0.94 0.000*
Platelet counts X 109
-0.42 0.061
Ferritin 0.61 0.081
Table ) (: Correlation between hemoglobin level
and MCV, platelet counts, and Ferritin
among the studied cases.
119. RegressionRegression gives equation for the line that bestgives equation for the line that best
models the relationship between 2 variables.models the relationship between 2 variables.
Types of patternTypes of pattern:: linear, curve,linear, curve, …. Will determine…. Will determine
the type of regression model to be applied to the data.the type of regression model to be applied to the data.
Linear regressionLinear regression: is the simplest form & is used: is the simplest form & is used
when the relation between x & y variables iswhen the relation between x & y variables is
approximated by straight line.approximated by straight line.
Linear regressionLinear regression gives thegives the equation of the straightequation of the straight
line that determine the relation an prediction of aline that determine the relation an prediction of a
change in a variable )dependant( due to change inchange in a variable )dependant( due to change in
the other variable )independentthe other variable )independent).).
121. t-testt-test is used to assess the level ofis used to assess the level of
significance.significance.
Multiple regressionMultiple regression : used to assess the: used to assess the
dependency of a dependant variable ondependency of a dependant variable on
several independent variables.several independent variables.
F-testF-test (ANOVA) is the test of(ANOVA) is the test of
significance.significance.
e.g.e.g. vit D levelvit D level ((age, amount of ca intake,age, amount of ca intake,
duration of exposure to sunduration of exposure to sun, ……), ……)