StatisticsThe Dataminingtools.net Team
StatisticsR can be used to provide a comprehensive set of statistical tables. Functions are provided to evaluate the cumulative distribution function P(X ≤ x), the probability density function and the quantile function (given q, the smallest x such that P(X ≤ x) > q), and to simulate from the distribution.
DistributionsDistribution 	R name 	additional argumentsbeta 	beta 		shape1, shape2, ncpbinomial binom 		size, probCauchy cauchy 	location, scalechi-squared chisqdf, ncpexponential 	exp 		rateF		 	f 		df1, df2, ncpgamma gamma 	shape, scalegeometric geomprob
DistributionsDistribution 		R name 	additional argumentshypergeometric 	hyper 		m, n, klog-normal lnormmeanlog, sdloglogistic logis 		location, scalenegative 		binomial 	nbinomsize, probnormal 		norm 		mean, sdPoisson pois 		lambdaStudent’s 		t		tdf, ncpuniform unif 		min, maxWeibullweibull 		shape, scaleWilcoxonwilcox 		m, n
Statistics
Barplots
Barplots
Pie Charts
Pie Charts
functionsUsing the functions median(), mean(), var(), sd(), fivenum(), summary()
Stem-and-leaf plotsstem produces a stem-and-leaf plot of the values in x. The parameter scale can be used to expand the scale of the plot. A value of scale=2 will cause the plot to be roughly twice as long as the default.
Stem and Leaf charts> table(age)age 4  6  7 11 12 23 32 33 34 43 44 45 54 56 65  2  1  1  2  5  6  2  1  2  1  1  1  2  2  2 > stem(age)  The decimal point is 1 digit(s) to the right of the |0 | 4467  1 | 1122222  2 | 333333  3 | 22344  4 | 345  5 | 4466  6 | 55age=scan()1: 23 4 12 23 43 237: 23 44 56 54 32 1213: 11 32 65 34 23 1219: 65 12 33 12 54 425: 7 6 56 45 34 11 2332: Read 31 items
Categorization of Data> salary <- c(23,2,34,1,32,22,45,3,4,5,12)> bins <- cut(salary, breaks=c(0,10,20,30,40,max(salary)))> bins [1] (20,30] (0,10]  (30,40] (0,10]  (30,40] (20,30] [7] (40,45] (0,10]  (0,10]  (0,10]  (10,20]Levels: (0,10] (10,20] (20,30] (30,40] (40,45]> table(bins)bins (0,10] (10,20] (20,30] (30,40] (40,45]       5       1       2       2       1 > pie(table(bins))
Categorization of Data
Histograms> data [1] 2 3 1 3 2 4 1 2 4 1 3 4 1 2 1 2 3 2 5 4 1[22] 2 3 4 3 2 4 3> hist(data)> rug(jitter(data))
Density> lines(density(eruptions,bw="SJ"),col="red")> data(faithful)> hist(eruptions,15,prob=T)> lines(density(eruptions))> lines(density(eruptions,bw="SJ"),col="red")
Density

R Statistics

  • 1.
  • 2.
    StatisticsR can beused to provide a comprehensive set of statistical tables. Functions are provided to evaluate the cumulative distribution function P(X ≤ x), the probability density function and the quantile function (given q, the smallest x such that P(X ≤ x) > q), and to simulate from the distribution.
  • 3.
    DistributionsDistribution R name additional argumentsbeta beta shape1, shape2, ncpbinomial binom size, probCauchy cauchy location, scalechi-squared chisqdf, ncpexponential exp rateF f df1, df2, ncpgamma gamma shape, scalegeometric geomprob
  • 4.
    DistributionsDistribution R name additional argumentshypergeometric hyper m, n, klog-normal lnormmeanlog, sdloglogistic logis location, scalenegative binomial nbinomsize, probnormal norm mean, sdPoisson pois lambdaStudent’s t tdf, ncpuniform unif min, maxWeibullweibull shape, scaleWilcoxonwilcox m, n
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    functionsUsing the functionsmedian(), mean(), var(), sd(), fivenum(), summary()
  • 11.
    Stem-and-leaf plotsstem produces astem-and-leaf plot of the values in x. The parameter scale can be used to expand the scale of the plot. A value of scale=2 will cause the plot to be roughly twice as long as the default.
  • 12.
    Stem and Leafcharts> table(age)age 4 6 7 11 12 23 32 33 34 43 44 45 54 56 65 2 1 1 2 5 6 2 1 2 1 1 1 2 2 2 > stem(age) The decimal point is 1 digit(s) to the right of the |0 | 4467 1 | 1122222 2 | 333333 3 | 22344 4 | 345 5 | 4466 6 | 55age=scan()1: 23 4 12 23 43 237: 23 44 56 54 32 1213: 11 32 65 34 23 1219: 65 12 33 12 54 425: 7 6 56 45 34 11 2332: Read 31 items
  • 13.
    Categorization of Data>salary <- c(23,2,34,1,32,22,45,3,4,5,12)> bins <- cut(salary, breaks=c(0,10,20,30,40,max(salary)))> bins [1] (20,30] (0,10] (30,40] (0,10] (30,40] (20,30] [7] (40,45] (0,10] (0,10] (0,10] (10,20]Levels: (0,10] (10,20] (20,30] (30,40] (40,45]> table(bins)bins (0,10] (10,20] (20,30] (30,40] (40,45] 5 1 2 2 1 > pie(table(bins))
  • 14.
  • 15.
    Histograms> data [1]2 3 1 3 2 4 1 2 4 1 3 4 1 2 1 2 3 2 5 4 1[22] 2 3 4 3 2 4 3> hist(data)> rug(jitter(data))
  • 16.
    Density> lines(density(eruptions,bw="SJ"),col="red")> data(faithful)>hist(eruptions,15,prob=T)> lines(density(eruptions))> lines(density(eruptions,bw="SJ"),col="red")
  • 17.