Dispersion- It is a statistical term that describes the size of the distribution of values expected for a particular variable and can be measured by several different statistics, such as Range, Variance and standard deviation.
Method of Dispersion-A measure of dispersion indicates the scattering of data. It explains the disparity of data from one another, delivering a precise view of their distribution.
Methods of Dispersion.
1.Relative Dispersion
a. Coefficient of Mean Deviation
b. Coefficient of Quartile Deviation
c. Coefficient of Range
d. Coefficient of Variation
2. Absolute Dispersion
a. Range
b. Quartile range
c. Standard deviation
d. Mean Deviation
Range- It is the difference between smallest & largest values in the dataset. Also the relative measure of range is known as Coefficient of Range.
Advantages and disadvantages of Range.
Calculation of Range by different Methods.
b. Quartile Range- The interquartile range of a group of observations is the interval between the values of upper quartile and the lower quartile for that group.
Advantages and Disadvantages of Quartile Range.
Calculation of Quartile Range by different Methods.
c. Standard Deviation- It measures the absolute dispersion (or) variability of a distribution. A small standard deviation means a high degree of uniformity of the observations as well as homogeneity in the series.
Advantages and Disadvantages of Quartile Range.
Calculation of Standard Deviation using.
i) Direct Method
ii) Short-cut Method
iii) Step Deviation Method.
Potential Solutions to the Fundamental Problem of Causal Inference: An OverviewEconomic Research Forum
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
Dispersion- It is a statistical term that describes the size of the distribution of values expected for a particular variable and can be measured by several different statistics, such as Range, Variance and standard deviation.
Method of Dispersion-A measure of dispersion indicates the scattering of data. It explains the disparity of data from one another, delivering a precise view of their distribution.
Methods of Dispersion.
1.Relative Dispersion
a. Coefficient of Mean Deviation
b. Coefficient of Quartile Deviation
c. Coefficient of Range
d. Coefficient of Variation
2. Absolute Dispersion
a. Range
b. Quartile range
c. Standard deviation
d. Mean Deviation
Range- It is the difference between smallest & largest values in the dataset. Also the relative measure of range is known as Coefficient of Range.
Advantages and disadvantages of Range.
Calculation of Range by different Methods.
b. Quartile Range- The interquartile range of a group of observations is the interval between the values of upper quartile and the lower quartile for that group.
Advantages and Disadvantages of Quartile Range.
Calculation of Quartile Range by different Methods.
c. Standard Deviation- It measures the absolute dispersion (or) variability of a distribution. A small standard deviation means a high degree of uniformity of the observations as well as homogeneity in the series.
Advantages and Disadvantages of Quartile Range.
Calculation of Standard Deviation using.
i) Direct Method
ii) Short-cut Method
iii) Step Deviation Method.
Potential Solutions to the Fundamental Problem of Causal Inference: An OverviewEconomic Research Forum
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
Correlation- If two variables are so inter-related in such a manner that change in one variable brings about change in the other variable, then this type of relation of variable is known as correlation.
Types of Correlation.
1.Based on the direction of change of variables
a. Positive
correlation
b. Negative
correlation
2. Based upon the number of variables studied
a. Simple
correlation
b. Partial correlation
c. Multiple correlation
3. Based upon the constancy of the ratio of change between the variables
a. Linear correlation
b. Non-linear correlation
METHODS OF STUDYING CORRELATION
1) GRAPHIC
METHODS
A) SCATTER DIAGRAM
B) CORRELATION
GRAPH
2). ALGEBRIC METHOD
A) KARL PEARSON COEFFICIENT OF CORRELATION
B) RANK CORRELATION METHOD
C) CONCURRENT DEVIATION METHOD
Uses of Correlation.
Merits of Correlation.
Demerits of Correlation.
How to choose the right statistics techniques in different situation. This short presentation provide a compact summary on various method of statistics either descriptive and inferential.
for further inquiry please reach me at bodhiyawijaya@gmail.com
Study of the distribution and determinants of
health-related states or events in specified populations and the application of this study to control health problems.
John M. Last, Dictionary of Epidemiology
Correlation- If two variables are so inter-related in such a manner that change in one variable brings about change in the other variable, then this type of relation of variable is known as correlation.
Types of Correlation.
1.Based on the direction of change of variables
a. Positive
correlation
b. Negative
correlation
2. Based upon the number of variables studied
a. Simple
correlation
b. Partial correlation
c. Multiple correlation
3. Based upon the constancy of the ratio of change between the variables
a. Linear correlation
b. Non-linear correlation
METHODS OF STUDYING CORRELATION
1) GRAPHIC
METHODS
A) SCATTER DIAGRAM
B) CORRELATION
GRAPH
2). ALGEBRIC METHOD
A) KARL PEARSON COEFFICIENT OF CORRELATION
B) RANK CORRELATION METHOD
C) CONCURRENT DEVIATION METHOD
Uses of Correlation.
Merits of Correlation.
Demerits of Correlation.
How to choose the right statistics techniques in different situation. This short presentation provide a compact summary on various method of statistics either descriptive and inferential.
for further inquiry please reach me at bodhiyawijaya@gmail.com
Study of the distribution and determinants of
health-related states or events in specified populations and the application of this study to control health problems.
John M. Last, Dictionary of Epidemiology
univariate and bivariate analysis in spss Subodh Khanal
this slide will help to perform various tests in spss targeting univariate and bivariate analysis along with the way of entering and analyzing multiple responses.
linearity concept of significance, standard deviation, chi square test, stude...KavyasriPuttamreddy
Linearity concept of significance, standard deviation, chi square test, students T- test, ANOVA test , pharmaceutical science, statistical analysis, statistical methods, optimization technique, modern pharmaceutics, pharmaceutics, mpharm 1 unit i sem, 1 year m
pharm, applications of chi square test, application of standard deviation , pharmacy, method to compare dissolution profile, statistical analysis of dissolution profile, important statical analysis, m. pharmacy, graphical representation of standard deviation, graph of chi square test, graph of T test , graph of ANOVA test ,formulation of t test, formulation of chi square test, formula of standard deviation.
The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. This population mean is not always known, but is sometimes hypothesized.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Probability
• Definition: The chance of any event occuring.
• Probability Density Function (PDF): The chance of
occurrence of a single random value within a range of
continuous values.
• Cumulative distribution function (CDF): The chance
of a single random value being less than a certain
value within a given sample space – Here the CDF is
for value of t-statistic.
3. Normal Distribution
• Standard Normal Dist – Based on infinite samples. Unimodal with
symmetrical tails – Mean = 0, SD = 1 – Hypothetical – Does not
exist – Used as PDF for Z-test.
• Normal Distribution – Unimodal with symmetrical tails with
Mean=Median=Mode, Skewness is (-3 to +3) and Kurtosis is (-1 to
+1). Can be converted to Std Normal by computation of Z-scores
and plotting the z-score frequency distribution. Z= (x-µ)/SD.
4. Property of Normal Distribution
• 68% of samples will lie within 1 SD from mean; 95%
will lie within 2 SD of mean; 99.7% will lie within 3 SD
of mean.
5. T-test
• Single sample T-test – Testing sample mean against
a known value.
• Independent samples T-test – Testing sample 1
mean against sample 2 mean.
• Paired T-test – Testing sample 1 mean before
against sample 1 mean after.
• Based on T-distribution – Similar to Normal
distribution but with lower peak and fatter tails.
6. Hypothesis testing using Normal/t-
Distributions
• Using Confidence Intervals –
Qualitative.
• Using CDF – Gives actual Probability -
Quantitative
7. Hypothesis testing using CDF
• First we try to summarize the “effect” – that is the
actual effect of our sample relative to the random
error that might have crept in – Here, it is the t-
statistic.
• The CDF for t-test basically tells us about the
probability of t-statistic of our study being less than
a certain value of t, given a specific “degree of
freedom”.
• For T-distribution, The PDF changes with increasing
sample size (increased degree of freedom). Thus the
CDF also changes.
8. Central Limit theorem
• The central limit theorem states that, as the sample size
increases, the shape of the sampling distribution approaches
normal shape. For n = 30, the shape of that distribution is
'almost' normal.
• Some researchers say, parametric methods may be used even in
non-normal data if sample size is large enough.
9. T-test
Assumptions:
1. Data are on a numerical scale
2. The distribution of the underlying population is
normal – Shapiro Wilk/Kolmogrov Smirnov
3. The samples have the same variance ('homogeneity of
variances') – Levene’s test – If variances are not
similar, Welch T-test is used to accommodate for
this.
4. Observations within a group are independent
5. The samples are randomly drawn from the population
• Null hypothesis - that there is no difference
between two means.
• Developed by - W.L. Gosset, and published under the
pseudonym Student.
11. T-test
• Involves calculation of T-statistic from difference of
means and the SE. – Basically encapsulated the
difference relative to SE.
• Look up the T-statistic on probability distribution
table based on degrees of freedom (sample space for
the CDF).
• Basically it looks at the probability of one sample
mean belonging to population of the other mean.
• The fatter tails of the T-distribution at lower
dF/sample sizes basically increases the distance of
rejection area from the sample mean and thus makes
the probability testing stricter, to account for larger
SDs in smaller samples.
12. Parametric and Non-parametric Tests
• Parametric tests are based on the assumption of almost normal
distribution of data within the groups – The probability
distribution tables for estimation of p-values are based on this
assumption.
• Parametric tests are based on estimation of statistic based on
actual values of variables – mean, SD.
• Thus if not normally distributed, erroneous p-values may be
computed.
• Non-parametric tests are based on Ranks of data within the set
– Hence not affect by extreme values/non normality of data
distribution/ordinal scale data.
• Parametric tests are usually more powerful than Non-parametric
if normality assumption is maintained– in the sense that Beta
error is low.
• If normality is not maintained, Non-parametric tests become
more powerful.
14. Independent Samples T-test on
SPSS• Necessities:
1. Your grouping variable should be coded numerically, 1/0; 1/2
etc. You may label the values appropriately in the “Variable
view”
2. Your dependent variable of interest should be in a separate
column.
15. Checking for Normality
• Qualitative: Histogram; QQ Plot
• Quantitative: Shapiro-wilk test.
• Here you need to see normality within each group So you need
to conduct separate tests of normality for each group
simultaneously So split the file.
• Go to Data Split File.
• Put the grouping variable in “Organize output by groups” click
Ok.
16. • Next go to Analyze Descriptive Statistics
Explore.
• Put variables of interest in “Dependent List” Click
“Plots” tab Check “Normality plots with tests” and
“Histogram” Continue Ok.
17. Normality Output
Tells you about any missing
Cases.
Both groups
normality
assumption should
be satisfied.
Statistical test of
Normality. If P-value is >
0.05/0.01, then Normal,
otherwise not Normal.
Weight Normal
Height Not Normal
18. Independent Samples T-test
• Although normality assumption was violated, just as an example,
we’ll conduct both parametric and non parametric tests on this
data.
• First Unsplit the file. Go to Data Split File Check “Analyze
all cases..”
• Go to Analyze “Compare Means” “Independent Samples t-
test”.
• Select the variables of interest and transfer to “Test Variables”
window. Transfer Grouping variable and Specify the groups –
Here 1/2. Click continue Ok.
19. Output
Descriptives – Self
Explanatory
If Levene’s test p > 0.05, go for equal variances assumed, else equal variances not
assumed
P-value of t-testT-statistic Degree of
freedom
20. Mann Whitney U-Test
• If you want to go for non parametric test instead,
• Go to Analyze Nonparametric tests -
Independent samples.
• Same procedure as t-test – Place test variable and
grouping variable.
P.S. It is also called Wilcoxon Rank Sum test.
21. Output
Just tells about mean rank and sum of
ranks.
Not important for us.
P-value for difference between groups.
Note this does not provide the descriptives. Take descriptives using
procedure described before – Median and Interquartile range important for
Non-parametric tests.
22. T-test on Graphpad
• Keep data ready in Excel Needs to be copy pasted in
Graphpad
• Open Graphpad.
• Select “Columns” from tabs on the left.
• Click the “Enter replicate values…” option as shown in pic
Create
23. • Create separate columns for group-variable as shown in pic and
paste the values from Excel.
• Click Analyze button.
• Click “Column statistics”
• Select the two columns for comparison and click Ok.
24. • Select all the descriptives you want.
• Select Shapiro-Wilk test.
• Click Ok.
25.
26. • Now we know normality of the group variables. And descriptives.
• Click Analyze button.
• Click “t-tests (and…”
• Select the two columns for comparison and click Ok.
27. • Click the appropriate – parametric or nonparametric
test.
• If using t-test, better go for Welch’s correction.
• Click OK.
T-test
Mean diff stats
Levene’s test
Output
28. Paired t-test on SPSS
• Used to test difference of means for a
variable in matched groups or same
samples at different time points.
• Data of the variable should be in 2
columns.
• Normality assumption has to be
satisfied for both variables – since
same samples are being used, no
splitting required – directly do Shapiro
Wilk on the two variables.
• Take out the descriptives of the two
variables as described before.
• Then Analyze Compare Means
Paired Samples T-test.
29. • Insert the before and after variables as pairs
as shown Click OK.
Output
Mean and SD of difference P-value
30. Wilcoxon Signed Rank Test
• Non Parametric equivalent of Paired t-test.
• Analyze Nonparametric tests 2-Related samples
• Fill test pairs same as paired t-test OK
31. Output
Just tells about mean rank
and sum of ranks.
Not important for us.
P-value for difference
between the variables.
Note this does not provide the descriptives. Take descriptives using
procedure described before – Median and Interquartile range important for
Non-parametric tests.
32. On Graphpad
• Enter column data as previously described.
• Analyze T-test Check the required variables
OK Click “Paired” and parametric/non parametric
as required.
Paired T-test
Mean diff stats
Correlation Stats
Output
33. Single Sample T-test on SPSS
• Used to test difference of Sample mean from that of an another known
mean. In Data View - Variable in a single column.
• Test for normality – for parametric vs non parametric.
• For parametric, Go to Analyze Compare Means One Sample t-test.
• Suppose we want to see whether the sample mean is different from
population average of 65 kg Insert test variable and enter the “Test
Value” as 65 OK.
35. On Graphpad
• Create separate column for
variable by pasting the values from
Excel.
• Click Analyze button.
• Click “Column statistics”
• Select the column for comparison
and click Ok.
• Click the required descriptives,
Normality test and both the one-
sample tests under “Inferences”.
• Enter Hypothetical value and click
OK.
36. Number of values 30
Minimum 63.00
25% Percentile 69.50
Median 73.50
75% Percentile 81.50
Maximum 90.00
Mean 75.23
Std. Deviation 7.899
Std. Error of Mean 1.442
Lower 95% CI of mean 72.28
Upper 95% CI of mean 78.18
Shapiro-Wilk normality test
W 0.9550
P value 0.2290
Passed normality test (alpha=0.05)? Yes
P value summary ns
One sample t test
Theoretical mean 65.00
Actual mean 75.23
Discrepancy -10.23
95% CI of discrepancy 7.284 to 13.18
t, df t=7.096 df=29
P value (two tailed) < 0.0001
Significant (alpha=0.05)? Yes
Wilcoxon Signed Rank Test
Theoretical median 65.00
Actual median 73.50
Discrepancy -8.500
Sum of signed ranks (W) 421.0
Sum of positive ranks 428.0
Sum of negative ranks -7.000
P value (two tailed) < 0.0001
Exact or estimate? Exact
Significant (alpha=0.05)? Yes
Sum 2257
Descriptives
Normality test
One-Sample t-test
One-Sample Wilcoxon Signed Rank test
Output