# Descriptive Statistics with R

## on Sep 28, 2012

## Descriptive Statistics with RPresentation Transcript

• DescriptiveStatistics with 2012-10-12 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
• Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
• Previously in this groupn Introduction to Rn Reading Data into R (1)n Reading Data into R (2) Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
• Menun mean and sdn median, quantiles, IQR, max, min, and rangen skewness and kurtosisn smarter ways of doing these
• Ingredients Statistics Programmingn Summary statistics for n vector and data frame continuous data n DATA\$VAR extraction n Normal data n Indexing by [row,col] n Non-normal data n Various functions n Normality check n skewness(), kurtosis() n summary() n describe(), describeBy()
• Descriptive Statistics http://www.ehow.com/info_8650637_descriptive-statistical-methods.html
• Descriptive statistics is the describingdiscipline of quantitatively themain features of a collection of data http://en.wikipedia.org/wiki/Descriptive_statistics
• OpenR Studio
• Read in BONEDEN.DAT.txt Name it bone
• Accessing a single variable in data set dataset name variable nameDATA\$VAR e.g., mean(bone\$age)
• vector
• ?http://healthy-india.org/enviromentalhealth/ direct_indirect2.html
• DATA\$VAR is a vector 1 2 3 4 5 6 7 8 OR “A” “B” “C” “D” “E” “F” “G” “H” like strings with values attached
• Multiple vectors of same length tied together Tied hereDATA is a data frame 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H 1 2 3 4 5 6 7 8 “A” “B” “C” “D” “E” “F” “G” “H
• Indexing: extraction of data from data frameExtract 1st to 15th rows Extract 1st to 12th columns bone[1:15 , 1:12] Colon in between Don’t forget comma
• age vector within bone data frame
• bone\$ageExtracted as a vector
• meanmean(x, trim = 0, na.rm = FALSE)
• sdsd(x, na.rm = FALSE)
• median median(x, na.rm = FALSE)
• 0th, 25th, 50th, 75th, and 100th percentiles by defaults quantile quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7)
• 75th percentile - 25th percentile IQRIQR(x, na.rm = FALSE, type = 7)
• maxmax(..., na.rm = FALSE)
• minmin(..., na.rm = FALSE)
• rangerange(..., na.rm = FALSE)
• We nowresort toexternalpackages
• Install and Load e1071, psych
• To load a package by command package name herelibrary(package) double quote “” can be omitted
• Assessment of normality