Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Distributions in R by Karishma Sharma 546 views
- Import files in r by Karishma Sharma 1040 views
- Graphs in R by Karishma Sharma 866 views
- Hypothesis testing in R by Karishma Sharma 3671 views
- Que es un torno cnc by andrea 13957 views
- Interpersonal Skills & Listening Es... by Mitti Limbachiya 483 views

999 views

841 views

841 views

Published on

Published in:
Technology

No Downloads

Total views

999

On SlideShare

0

From Embeds

0

Number of Embeds

8

Shares

0

Downloads

0

Comments

0

Likes

4

No embeds

No notes for slide

- 1. Data Exploration in R Codes are in Blue. For Feedback Mail me: sharmakarishma91@gmail.com
- 2. Descriptive Statistics with R • Descriptive Statistics are used to present quantitative descriptions in a manageable form. • In a research study we may have lots of measures. Or we may measure a large number of people on any measure. • Descriptive statistics help us to simply large amounts of data in a sensible way. • Each descriptive statistic reduces lots of data into a simpler summary. For Feedback Mail me: sharmakarishma91@gmail.com
- 3. Statistical Function in R • To know all statistical function which are available in R. >help(package=stats) This command gives you a list of function in alphabetical order. For Feedback Mail me: sharmakarishma91@gmail.com
- 4. • Mean(): this is a generic function to find arithmetic mean in R. mean(x, na.rm = FALSE) > x<-c(10,25,5,63,45) > mean(x) [1] 29.6 If in a data, there is some value missing then, it gives an error, but seting na.rm argument as TRUE, a logical value indicating whether NA values should be stripped before the computation proceeds. > x<-c(10,25,NA,5,63,NA,45) > mean(x) [1] NA > mean(x, na.rm=T) [1] 29.6 For Feedback Mail me: sharmakarishma91@gmail.com
- 5. • Median: This function is used to calculate a simple median. Median is the middle point of ordered data. either the number of observations is odd, then the middle observation is a median or the number of observations is even the average of the two middle observations is a median. > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, 191) > median(x) [1] 167 > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, 191, 175) > median(x) [1] 167.5 For Feedback Mail me: sharmakarishma91@gmail.com
- 6. Contd.. > x<-c(155, 160, 171, 182, 162, 153, 190, 167, 168, 165, NA, 191) > median(x) [1] NA > median(x, na.rm=T) [1] 167 For Feedback Mail me: sharmakarishma91@gmail.com
- 7. • Sd() : This function computes the standard deviation of the values in x. sd(x, na.rm = FALSE) • If na.rm is TRUE then missing values are removed before computation proceeds. > x<-c(10,25,NA,5,63,NA,45) > sd(x) [1] NA > sd(x, na.rm=T) [1] 24.30638 For Feedback Mail me: sharmakarishma91@gmail.com
- 8. Exploring Data • summary(file_name) # get means for variables in data frame mydata excluding missing values sapply(mydata, mean, na.rm=TRUE) • library(Hmisc) # n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles,5 lowest and 5 highest scores describe(file_name) • library(pastecs) stat.desc(file_name) For Feedback Mail me: sharmakarishma91@gmail.com
- 9. Contd.. • library(psych) # item name ,item number, nvalid, mean, sd, median, mad, min, max, skew, kurtosis, se describe(file_name) • find frequency distribution table(file_name) For Feedback Mail me: sharmakarishma91@gmail.com
- 10. Exploring the Workspace • objects() # Lists the objects in the workspace • ls() # Same as objects() • remove() # Remove objects from the workspace • rm(list=ls()) #clearing memory space • detach(package:ABC) # Detached packages when no longer need them • search() # Shows the loaded packages • library() # Shows the installed packages • dir() # show files in the working directory For Feedback Mail me: sharmakarishma91@gmail.com
- 11. Sub-setting Observations # Selecting the first 30 observations mydata2 <- mydata2[1:30,] For Feedback Mail me: sharmakarishma91@gmail.com
- 12. Merge Two Files • Just read the two data frames into R mydata1 = read.csv(path1, header=T) mydata2 = read.csv(path2, header=T) • Then, merge myfulldata = merge(mydata1, mydata2) • If we want to merge by a single variable wecan do it • > merged.data <- merge(dataset1, dataset2, by="countryID") For Feedback Mail me: sharmakarishma91@gmail.com
- 13. Interactive Data Reading • mydata <- read.table(file.choose(), header=TRUE, sep="t", na.strings = "- 9") • Reading .csv data mydata <- read.csv("http://dss.princeton.edu/training/students.csv", header=TRUE) • Reading space , tab, comma‐separated data write.table(mydata, file = "test.txt", sep = "t") For Feedback Mail me: sharmakarishma91@gmail.com
- 14. THANK YOU. Reference: http://cran.r-project.org/ For Feedback Mail me: sharmakarishma91@gmail.com

No public clipboards found for this slide

Be the first to comment