Data exploration in r

999 views
841 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
999
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Data exploration in r

  1. 1. Data Exploration in R Codes are in Blue. For Feedback Mail me: sharmakarishma91@gmail.com
  2. 2. Descriptive Statistics with R • Descriptive Statistics are used to present quantitative descriptions in a manageable form. • In a research study we may have lots of measures. Or we may measure a large number of people on any measure. • Descriptive statistics help us to simply large amounts of data in a sensible way. • Each descriptive statistic reduces lots of data into a simpler summary. For Feedback Mail me: sharmakarishma91@gmail.com
  3. 3. Statistical Function in R • To know all statistical function which are available in R. >help(package=stats) This command gives you a list of function in alphabetical order. For Feedback Mail me: sharmakarishma91@gmail.com
  4. 4. • Mean(): this is a generic function to find arithmetic mean in R. mean(x, na.rm = FALSE) > x<-c(10,25,5,63,45) > mean(x) [1] 29.6 If in a data, there is some value missing then, it gives an error, but seting na.rm argument as TRUE, a logical value indicating whether NA values should be stripped before the computation proceeds. > x<-c(10,25,NA,5,63,NA,45) > mean(x) [1] NA > mean(x, na.rm=T) [1] 29.6 For Feedback Mail me: sharmakarishma91@gmail.com
  5. 5. • Median: This function is used to calculate a simple median. Median is the middle point of ordered data. either the number of observations is odd, then the middle observation is a median or the number of observations is even the average of the two middle observations is a median. > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, 191) > median(x) [1] 167 > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, 191, 175) > median(x) [1] 167.5 For Feedback Mail me: sharmakarishma91@gmail.com
  6. 6. Contd.. > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, NA, 191) > median(x) [1] NA > median(x, na.rm=T) [1] 167 For Feedback Mail me: sharmakarishma91@gmail.com
  7. 7. • Sd() : This function computes the standard deviation of the values in x. sd(x, na.rm = FALSE) • If na.rm is TRUE then missing values are removed before computation proceeds. > x<-c(10,25,NA,5,63,NA,45) > sd(x) [1] NA > sd(x, na.rm=T) [1] 24.30638 For Feedback Mail me: sharmakarishma91@gmail.com
  8. 8. Exploring Data • summary(file_name) # get means for variables in data frame mydata excluding missing values sapply(mydata, mean, na.rm=TRUE) • library(Hmisc) # n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles,5 lowest and 5 highest scores describe(file_name) • library(pastecs) stat.desc(file_name) For Feedback Mail me: sharmakarishma91@gmail.com
  9. 9. Contd.. • library(psych) # item name ,item number, nvalid, mean, sd, median, mad, min, max, skew, kurtosis, se describe(file_name) • find frequency distribution table(file_name) For Feedback Mail me: sharmakarishma91@gmail.com
  10. 10. Exploring the Workspace • objects() # Lists the objects in the workspace • ls() # Same as objects() • remove() # Remove objects from the workspace • rm(list=ls()) #clearing memory space • detach(package:ABC) # Detached packages when no longer need them • search() # Shows the loaded packages • library() # Shows the installed packages • dir() # show files in the working directory For Feedback Mail me: sharmakarishma91@gmail.com
  11. 11. Sub-setting Observations # Selecting the first 30 observations mydata2 <- mydata2[1:30,] For Feedback Mail me: sharmakarishma91@gmail.com
  12. 12. Merge Two Files • Just read the two data frames into R mydata1 = read.csv(path1, header=T) mydata2 = read.csv(path2, header=T) • Then, merge myfulldata = merge(mydata1, mydata2) • If we want to merge by a single variable wecan do it • > merged.data <- merge(dataset1, dataset2, by="countryID") For Feedback Mail me: sharmakarishma91@gmail.com
  13. 13. Interactive Data Reading • mydata <- read.table(file.choose(), header=TRUE, sep="t", na.strings = "- 9") • Reading .csv data mydata <- read.csv("http://dss.princeton.edu/training/students.csv", header=TRUE) • Reading space , tab, comma‐separated data write.table(mydata, file = "test.txt", sep = "t") For Feedback Mail me: sharmakarishma91@gmail.com
  14. 14. THANK YOU. Reference: http://cran.r-project.org/ For Feedback Mail me: sharmakarishma91@gmail.com

×