This document discusses various parametric tests used for hypothesis testing with quantitative data, including:
- One-sample t-test to compare a sample mean to a predefined value
- Two-sample t-test to compare means of two independent groups
- Paired t-test to compare means of two related/matched groups
- ANOVA tests to compare means of three or more groups, including one-way and two-way ANOVA
- Assumptions of parametric tests like normal distribution and additive effects are also outlined.
2.0.statistical methods and determination of sample sizesalummkata1
statistical methods and determination of sample size
These guidelines focus on the validation of the bioanalytical methods generating quantitative concentration data used for pharmacokinetic and toxicokinetic parameter determinations.
2.0.statistical methods and determination of sample sizesalummkata1
statistical methods and determination of sample size
These guidelines focus on the validation of the bioanalytical methods generating quantitative concentration data used for pharmacokinetic and toxicokinetic parameter determinations.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. Hypothesis Testing for Quantitative data
Quantitative type of measurements such as haemoglobin level,
serum Zinc level and blood pressure etc., are generally
summarized in terms of means.
There are situations for comparison of means in hypothesis
testing. One such testing could be comparison of a computed
mean, for example mean haemoglobin level with a prespecified
standard or proportion mean value.
The other could be comparison of means in two independent
groups such as comparison of mean haemoglobin levels in a
random sample of well nourished and under nourished groups.
3. Inference on Quantitative Data for Comparison of Two Means
Does the data
follow a
Gaussian
distribution?
Unpaired Paired Unpaired Paired
Yes No
Student’s t-
Test for
unpaired data
Student’s t-
Test for
paired data
Wilcoxon
rank sum
test
Wilcoxon
signed rank test
4. Comparison of mean with a pre-specified value
(One sample t-test):
Consider a hypothetical example of haemoglobin levels in 15 HIV positive
neonates. The sample (n=15), has the levels of Hb in g/dl are 12.6, 15.4, 11.5, 12.4,
13.2, 13.8, 12.8, 14.4, 16.2, 14.8, 15.1, 3.5, 12.9, 16.0, 14.9. The mean and SD of
the above series of observations is 13.9 and 1.41 respectively.
The null hypothesis under test is that the average of Hb level of HIV positive
neonates is the same 15.9g/dl as is normal neonates.
The alternative hypothesis is Hb level is less than 15.9 g/dl. Since the value is
expected to be lower in HIV positive neonates. The value of t is computed as:
t=[ x
̅ -µ] / SE (x
̅ )
t = (13.9-15.9)/ (1.41/√15) or -5.49 or 5.49 (after ignoring sign)
Here the critical value of t at 14 degree of freedom is 2.62 at 1% level for one
tailed test. The calculated value of t is more than critical value at 1% and
hypothesis is rejected with P<0.01. This is interpreted as the mean Hb level in HIV
positive neonates is significantly (P<0.01) lower than that of the level in normal
neonates.
5. Comparison of two Independent means (Two sample
independent t-test):
To compare mean systolic blood pressure levels in type 1 diabetic children in comparison
to control group. The sample of children in each group is 25.
The null hypothesis Ho: There is no difference in mean sys BP levels between cases and
control groups.
The mean and SD of observed in these groups are as follows.
Systolic Bp
(x)
Diabetes Group
(n1=25)
Control Group
(n2=25)
Mean 112.66 116.8
SD 9.69 7.04
The t -test criterion to test whether the means are different can be calculated by using the
following formula:
t = (x
̅ 1 – x
̅ 2) / SE (x
̅ 1 – x
̅ 2) or
Where,
6. x
̅ 1, x
̅ 2 are mean sysBP and Sx1, Sx2 are standard deviations of
observations in case and control groups respectively.
Sx1x2 is the pooled standard deviation. The degrees of freedom are
n1+n2-2.
Students’t value by using the above formula is
t = (112.656-116.264)/8.467×√(1/25+1/25) where Sx1x2 = 8.467
= -3.608/2.3948
t = -1.502 or 1.502 (after ignoring sign)
The calculated value of t= 1. 502. Compare this value with critical
value of 2.01 using t-tables (table-23) at 48df. The calculated value
is less than the table value. Thus the null hypothesis of equality of
means is not rejected. The result is not significant (p>0.05) at 5%
level.
7. Comparison of means in paired setup (paired t- test):
Consider a clinical trial on asthmatic children where the interest is to
compare force expiratory volume in one second (FEV1) which measures
pulmonary impairment in an intervention. The values of FEV1 on 15
patients before and after intervention for the purpose of illustration are
given below:
Before
Intervention
40.5 82.4 90.3 82.4 86.9 75.8 45 76.1 91.2 88.3 120.2 72.5 79.3 75.3 84.5
After
Intervention
45.1 83.5 92.4 83.1 88.3 78.2 55.8 75.2 90.9 88.9 110.3 73.7 75.9 71.2 90.5
Difference -4.6 -1.1 -2.1 -0.7 -1.4 -2.4 -11 0.9 0.3 -0.6 9.9 -1.2 3.4 4.1 -6
The null hypothesis:
Ho: There is no significance difference between values of FEV1 before and after
intervention periods.
8. The test of significance in this case becomes one sample t-test as
the test is applied on mean of differences.
Mean of differences = -0.82
S.D difference = 4.67
Then, t= -(0.82)/4.67×√(15) = -0.679 or 0.679 (Ignoring sign)
From student’s-t tables critical value at 14 df is 2.145 for two tailed
test.
After ignoring the sign the calculated value (0.679) is much less
than the critical value and hence null is not rejected.
This is interpreted as mean FEV1 before is not significantly
(P>0.05) different after intervention.
9. ANOVA (Analysis of Variance)
• Analysis of Variance (ANOVA) is a collection
of statistical models used to analyse the
differences between group means or variances.
• Compares multiple groups at one time
• Developed by R.A.Fischer
11. One way ANOVA
Compares two or more unmatched groups when
data are categorized in one factor
Ex:
1. Comparing a control group with three
different doses of aspirin
2. Comparing the productivity of three or more
employees based on
• working hours in a company
12. Two way ANOVA
• Used to determine the effect of two nominal
predictor variables on a continuous outcome
variable.
• It analyses the effect of the independent
variables on the expected outcome along with
their relationship to the outcome itself.
Ex: Comparing the employee productivity based
on the working hours and working conditions.
13. Assumptions of ANOVA:
• The samples are independent and selected
randomly.
• Parent population from which samples are
taken is of normal distribution.
• Various treatment and environmental effects
are additive in nature.
• The experimental errors are distributed
normally with mean zero and variance σ2.
14. • It again depends on experimental designs
• Null hypothesis:
• Hο = All population means are same
• If the computed Fc is greater than F critical
value, we are likely to reject the null
hypothesis.
• If the computed Fc is lesser than the F critical
value , then the null hypothesis is accepted.
15. ANOVA Table
Sources of
Variation
Sum of
squares
(SS)
Degrees of
freedom
(d.f)
Mean squares
(MS)
𝒔𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂
𝒓𝒆𝒔/
𝒅̅𝒆𝒈𝒓𝒆𝒆𝒔 𝒐𝒇 𝒇
𝒓𝒆𝒆𝒅̅𝒐𝒎
F - Ratio
Between
samples
or groups
(Treatments)
Treatment
sum of
squares (
TrSS)
(k-1) 𝑇𝑟𝑆𝑆/ (𝑘 − 1) 𝑇𝑟𝑀𝑆/𝐸𝑀𝑆
Within
samples or
groups (
Errors )
Error sum of
squares (ESS)
(n-k) 𝐸𝑆𝑆/(𝑛 − 𝑘)
Total Total sum of
squares (TSS)
(n-1)
16. S.No. Type of group Parametric test
1. Comparison of two paired groups Paired t-test
2. Comparison of two unpaired groups Unpaired two sample t-test
3. Comparison of population and sample
drawn from the same population
One sample t-test
4. Comparison of three or more matched
groups but varied in two factors
Two way ANOVA
5. Comparison of three or more matched
groups but varied in one factor
One way ANOVA
6. Correlation between two variables Pearson Correlation
17. ANOVA F-test (one way analysis):
This method compares means in three or more groups.
The total variance in all groups combined is broken into
between group variation and within group variation.
A test criterion of ratio of these two components of variation
is used to find whether the group means are different or not.
This procedure is mathematically complex and statistical
packages can be used for computational purposes.