Like this presentation? Why not share!

# Lecture notes on R - comparison

## by Laszlo Baranyai, Associate Professor at Corvinus University of Budapest on Mar 11, 2012

• 1,226 views

Applied mathematical statistics in agricultural engineering with R. Topic: comparison. New functions: multiple range test, errorbar.

Applied mathematical statistics in agricultural engineering with R. Topic: comparison. New functions: multiple range test, errorbar.

### Views

Total Views
1,226
Views on SlideShare
1,074
Embed Views
152

Likes
0
0
0

### 2 Embeds152

 http://www.baranyailaszlo.hu 148 http://baranyailaszlo.hu 4

## Lecture notes on R - comparisonPresentation Transcript

• Lecture notes on RApplied mathematical statistics in agricultural engineering with R comparison László Baranyai, PhD Corvinus University of Budapest Department of Physics and Control 2012
• Introduction of data tableMorphological data, sepal and petal size in centimeters, ofIris flowers1 were measured and collected into one table:> iris[1:2,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa> summary(iris\$Species) setosa versicolor virginica 50 50 50One row corresponds to one piece of flower.1 Anderson, E. (1935) The irises of the Gaspe Peninsula, Bulletin of the AmericanIris Society, 59, 2–5.
• Summary of data columnsThe summary() function can be used to make basiccomparison with descriptive statistics:> summary(iris\$Sepal.Length[iris\$Species=="setosa"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.300 4.800 5.000 5.006 5.200 5.800> summary(iris\$Sepal.Length[iris\$Species=="versicolor"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.900 5.600 5.900 5.936 6.300 7.000> summary(iris\$Sepal.Length[iris\$Species=="virginica"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.900 6.225 6.500 6.588 6.900 7.900The shape of distribution is partially shown by the 25%,50% and 75% quantiles.
• BoxplotThe figure is created by the following command. Comparebox lines with previous results provided by summary.> boxplot(Sepal.Length ~ Species,data=iris,col="lavender")
• One-sample t-testThe one-sample t-test may be used to calculate theconfidence interval for mean.> t.test(iris\$Sepal.Length) One Sample t-testdata: iris\$Sepal.Lengtht = 86.4254, df = 149, p-value < 2.2e-16alternative hypothesis: true mean is not equal to 095 percent confidence interval: 5.709732 5.976934sample estimates:mean of x 5.843333> t.test(iris\$Sepal.Length)\$conf.int[1] 5.709732 5.976934attr(,"conf.level")[1] 0.95
• Multiple range testThe following function creates a summary table.range.test <- function(Data,Class){ Name <- sort(unique(Class)) Mean <- rep(0,length(Name)) N <- Mean SD <- Mean CI95.min <- Mean CI95.max <- Mean for (i in 1:length(Name)) { tmp <- Data[Class==Name[i]] Mean[i] <- mean(tmp) N[i] <- length(tmp) SD[i] <- sd(tmp) CI95.min[i] <- t.test(tmp)\$conf.int[1] CI95.max[i] <- t.test(tmp)\$conf.int[2] } return( data.frame(Name,Mean,SD,N,CI95.min,CI95.max) )}
• Multiple range testThe function range.test() computes basic statisticalparameters typically used for comparison. Usage:> range.test(iris\$Sepal.Length,iris\$Species) Name Mean SD N CI95.min CI95.max1 setosa 5.006 0.3524897 50 4.905824 5.1061762 versicolor 5.936 0.5161711 50 5.789306 6.0826943 virginica 6.588 0.6358796 50 6.407285 6.768715Decision on similarity is done according to the- overlap of confidence intervals- least significant difference derived from mean and sd- mean and its ±2×sd or ±3×sd environment
• Two-sample t-testThe t-test can be used to compare groups provided thatthey follow normal distribution:> a <- iris\$Sepal.Length[iris\$Species=="setosa"]> b <- iris\$Sepal.Length[iris\$Species=="versicolor"]> t.test(a,b) Welch Two Sample t-testdata: a and bt = -10.521, df = 86.538, p-value < 2.2e-16alternative hypothesis: true difference in means is notequal to 095 percent confidence interval: -1.1057074 -0.7542926sample estimates:mean of x mean of y 5.006 5.936
• Kolmogorov-Smirnov testThe non-parametric, also called robust, version of test ofsimilarity is the Kolmogorov-Smirnov test:> ks.test(a,b) Two-sample Kolmogorov-Smirnov testdata: a and bD = 0.78, p-value = 1.230e-13alternative hypothesis: two-sidedWarning message:In ks.test(a, b) : cannot compute correct p-values with tiesThe test value of D=0.78 is considered to be significant.
• Analysis of variancesThe analysis of variances (ANOVA) compares within andbetween group variances to test whether at least onegroup differs significantly from others.> summary(aov(Sepal.Length ~ Species,data=iris)) Df Sum Sq Mean Sq F value Pr(>F)Species 2 63.212 31.606 119.26 < 2.2e-16 ***Residuals 147 38.956 0.265---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1According to above result, species have significant effecton sepal length.
• ErrorbarThe output of function range.test() can be used tomake graphical comparison and plot error bars:errorbar <- function(mrt){ x <- seq(1,length(mrt\$Name),by=1) y <- c(mrt\$CI95.min,mrt\$CI95.max) plot(c(min(x),max(x)),c(min(y),max(y)),type="n",xaxt="n", xlab="Classes",ylab="Data") axis(1,at=x,labels=mrt\$Name) points(x,mrt\$Mean,col="blue") lines(x,mrt\$Mean,col="blue",lty="dashed") arrows(x,mrt\$CI95.min,x,mrt\$CI95.max, code=3,angle=90,length=0.1,col="blue")}
• ErrorbarThe usage is simple:> md <- range.test(iris\$Sepal.Length,iris\$Species)> errorbar(md)
• SummaryComparison of experimental results may be done usingreports of- two-sample t-test (depends on distribution)- Kolmogorov-Smirnov test (non-parametric)- analysis of variances- multiple range test and derivativesGraphical tools for comparison were introduced- boxplot (it has information about distribution)- errorbar