Day 2b i/o.pptx

I/O
Day 2 - Introduction to R for Life Sciences

Before input and output: folders
Find out where you are: getwd()
Go elsewhere: setwd("S://SeqData/Illumina/14apr2014")
Convenience: choose.dir() and file.choose() (Windows only)
Make sure your scripts and ‘source data’ are backed up
Derived data should not be backed up

Input - formats
.RData - data in binary form, as produced by
save.image(file='Foxo.rda') # 'workspace'
Similar:
save(table1, table2, pvalues, file="mytables.rda") ↔ load("mytables.rda")
application-specific data: special libraries
(e.g.: XML, JSON, .bam, .bed, .gff, .bw. Also Excel)
tab-delimited data

Tab-delimited input
function read.table()
→ read.delim(), read.delim2(), read.csv(), read.csv2()
Have different defaults but all return a data.frame
common arguments: file, header, sep, quote, row/col.names,
stringsAsFactors
can be URL!
"t"

Tab-delimited input
function read.table()
→ read.delim(), read.delim2(), read.csv(), read.csv2()
Have different defaults but all return a data.frame
> SGD <- read.table("SGD.txt", sep="t", header=TRUE, row.names=1)
> SGD <- read.delim("SGD.txt", row.names=1) ## does the same thing!
Put following in "C:/Users/YourName/Documents/.Rprofile":
options(stringsAsFactors = FALSE)

Data types
Sometimes the data type is wrong:
> mean( c("-0.82", "1.12", "-0.39") ) # note the quotes
[1] NA
Warning message:
In mean.default(c("-0.82", "1.12", "-0.39")) :
argument is not numeric or logical: returning NA
Sometimes this doesn’t matter:
> paste(1,2,3, sep=",")
[1] "1,2,3"

Type conversion
Automatic conversion('coercion'):
sum( c(TRUE, FALSE, TRUE) ) => 2
Explicit conversion:
as.numeric(); as.logical(); as.character(); as.matrix(), as.factor(), …
Checking the type:
is.numeric(); is.logical; is.character(); is.matrix(), is.factor(), …
Special cases:
is.null()
is.na() # Example: x[ ! is.na(x) ] <- 0 #or x <- x[ ! is.na(x) ]

Selecting data from data.frame
Index can be vector of numbers, logicals, names
Notation: some.frame[myrows, mycolumns] # as for matrix
But also: some.frame$geneName # for a particular column
some.frame[ , my.col ] # if the column(s) varies

Checking data.frames
Overview:
str(fr) # pay attention to the types!
Size:
dim(fr) # rows, then columns (as for matrices)
Distinct values:
unique(fr$type) # also consider length(unique(fr$type))
Arithmetic:
max(fr$length) # also: min, mean, sd, var, median, sum

Creating and extending data.frames
New frame:
f <- data.frame(gene.names, p.values)
Adding columns to frame:
f$status=new.status)
Adding rows to frame:
f <- rbind(f, list(genes2, pval2))
f <- rbind(f, another.data.frame)
You cannot "delete" rows or columns.
names and typesmust match!

I/O Caveats
Single or double quotes as part of strings
Comment-characters as part of strings
Spaces instead of tabs
Carriage-returns (Mac/Windows/Linux)
Duplicates in row or column names
Always check thenumber andnames of rowsand columns andtheir types!

Duplicate values
> v <- c("a", "b", "c", "d", "d", "e", "f", "a", "g", "a")
> duplicated(v)
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
> v[ duplicated(v) ]
[1] "d" "a" "a"
# sum(duplicated(v)) → 3
> v[ ! duplicated(v) ]
[1] "a" "b" "c" "d" "e" "f" "g" # same as unique(v)

Tab-delimited output
write.table() with arguments similar to read.table(). To get an empty
topleft cell, use col.names=NA
Again, check the results.

Day 2b i/o.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Day 2b i/o.pptx

Similar to Day 2b i/o.pptx (20)

More from Adrien Melquiond

More from Adrien Melquiond (6)

Recently uploaded

Recently uploaded (20)

Day 2b i/o.pptx