2. Contents
1. What is R?
2. An Introductory Example
3. Types and Data Structures (in C and R)
4. Functional Programming (apply() function)
5. R Graphics
6. Bioinformatics (RNA-seq)
4. Computer Language Popularity
The TOIBE index is the weighted mean of following form:
((hits(PL,SE1)/hits(SE1) + ... + hits(PL,SEn)/hits(SEn))/n
where the PL is the search query of following pattern
+"<language> programming”
7. Classification of Computer Languages
by abstraction levels
Assembly Languages
High Level Languages
C, C++, Java, …
Very High Level Languages (VHLL)
Scripting languages: Perl, Python, Ruby, …
Domain Specific Language
R : statistics
Matlab, …
Higher level language is more closer to the natural language.
9. Simple Example (1)
histogram
> x<-rnorm(100000000)
> head(x)
[1] 0.4667083 0.8907642 0.8147121
0.4839252 0.5811472 0.4941122
> hist(x)
> system.time(x<-rnorm(100000000))
user system elapsed
8.771 0.249 9.020
10. Simple Example (2) t-test
>group1 <- c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0)
> group2 <- c(1.9, 0.8, 1.1, 0.1,-0.1,4.4,5.5,1.6,4.6,3.4)
> group1
[1] 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0.0 2.0
> group2
[1] 1.9 0.8 1.1 0.1 -0.1 4.4 5.5 1.6 4.6 3.4
> boxplot(group1, group2)
> t.test(group1, group2, var.equal=T)
Two Sample t-test
data: group1 and group2
t = -1.8608, df = 18, p-value = 0.07919
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.363874 0.203874
sample estimates:
mean of x mean of y
0.75 2.33
http://cse.naro.affrc.go.jp/takezawa/r-tips/r/65.html
11. Getting Help in R
Display the contents of the R manual. (If you know the
name of the function)
Search functions by keywords
Search functions by (partial) matching of function names
?rnorm
help(“rnorm”)
??”normal distribution”
help.search(“normal distribution”)
find(“rnorm”)
appropos(“rnorm”)
14. Probability Distributions
dnorm() : Density function
pnorm() : (cumulative) probability distribution function
qnorm() : Quantile
rnorm() : Random number generation
“Quick-R” site
http://www.statmethods.net/advg
raphs/probability.html
15. Plotting the density
function (1/2)> x<-seq(-4,4,length=100)
> x
[1] -4.00000000 -3.91919192 -3.83838384 -3.75757576 -3.67676768 -3.59595960
[7] -3.51515152 -3.43434343 -3.35353535 -3.27272727 -3.19191919 -3.11111111
[13] -3.03030303 -2.94949495 -2.86868687 -2.78787879 -2.70707071 -2.62626263
… omitted
> dx<-dnorm(x)
17. Plotting the probability
distribution function> x<-seq(-4,4,length=100)
> px<-pnorm(x)
> plot(x,px,type="l",xlab="x",ylab="y",main="The normal distribution")
18. Quantile (1/5)
plot(x,dnorm(x), type="n", ylim=c(0,1))
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.html
Copyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
19. Quantile (2/5)
plot(x,dnorm(x), type="n", ylim=c(0,1))
curve(dnorm(x), type="l", add=T)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.html
Copyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
20. Quantile (3/5)
plot(x,dnorm(x), type="n", ylim=c(0,1))
curve(dnorm(x), type="l", add=T)
curve(pnorm(x), type="l", lty=3, add=T)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.html
Copyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
21. Quantile (4/5)
plot(x,dnorm(x), type="n", ylim=c(0,1))
curve(dnorm(x), type="l", add=T)
curve(pnorm(x), type="l", lty=3, add=T)
abline(h=0.05)
abline(h=0.95)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.html
Copyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
23. Calculation of the p-value
of a numeral vector x.
http://d.hatena.ne.jp/hoxo_m/20130213/p1
norm.dist.p <- function(x) {
n <- length(x)
mean <- mean(x)
sd <- sd(x) / sqrt(n)
p <- pnorm(-abs(mean), mean=0, sd=sd) * 2
p
}
x <- rnorm(10, mean=0)
p <- norm.dist.p(x)
cat("p =", p, "n")
24. Bias in small samples
alpha = 0.05
ps <- sapply(1:10000, function(i)
{
x <- rnorm(10)
p <- norm.dist.p(x)
p
})
fp <- sum(ps < alpha) / length(ps)
cat("alpha error rate =", fp,
"n")
alpha error rate = 0.0812
26. Types in C (partial)Integer Types
Floating-Point Types
27. Memory Layout of C
Programs
1. Text segment (Code segment)
2. Initialized data segment
(initialized global variables
and static variables)
3. Uninitialized data segment
4. Stack (automatic variables)
5. Heap (for dynamic memory
allocation by malloc(), free(),
…)
http://www.geeksforgeeks.org/memory-layout-of-c-program/
28. Stack frame and
function callint main() {
int x = 0;
a();
return 0;
}
int a() {
int x=1;
b();
c();
return 0;
}
http://www.tenouk.com/ModuleZ.html
29. Recursion in C
#include<stdio.h>
Fact(int f) {
if (f == 1) return 1;
return (f * Fact(f - 1)); //called in function only once
}
int main() {
int fact;
fact = Fact(5);
printf("Factorial is %d", fact);
return 0;
}
http://www.programmingspark.com/2013/03/Working-of-Recursion-in-detail-using-Stack.html
49. bodymap_count_table.txt
Tab delimited format
The first line shows a list of sample identifiers. (19 human organs
The first column is a list of gene identifiers (Ensemble genes)
56. Distribution of the data
> hist(data$adipose)
> hist(log10(data$adipose))
> summary(log10(data$adipose))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-Inf -Inf -Inf -Inf -Inf 6
> summary(log10(data$adipose[data$adipose>0]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.462 2.382 2.287 3.109 6.200
58. Environment (1/2)
Environment basics : http://adv-
r.had.co.nz/Environments.html
The job of an environment is to associate, or bind, a set of
names to a set of values.
You can think of an environment as a bag of names:
• If an object has no names pointing to it, it gets
automatically deleted by the garbage collector.
• Every object in an environment has a unique name.
• The objects in an environment are not ordered (i.e., it
doesn’t make sense to ask what the first object in an
environment is).
59. Environment (2/2)Most environments are created as a consequence of using functions.
An environment has a parent environment.
http://adv-r.had.co.nz/Environments.html
## 20. Quantile (3/5)
and similarly, another curve is plotting on the previous one,
with different line type,
## 21. Quantile (4/5)
abline() function draws, a horizontal line or a vertical line on the previous output.
## 23.
Next, I’d like to explain how to define your own function
And also I’d like to demonstrate the meaning of t-test,
by using a computer simulation.
norm.dist.p is the name of function you are going to create.
When the probability variable x follows normal distribution of this form,
that is a series of samples are taken from a normal distribution of this form,
the mean of the sample will follow the normal distribution of this form.
the function norm.dist.p returns the p-value of the sample distribution.
for the two-sided test.
the content of this function will be obvious.
* sd() standard deviation
* sqrt() square root
<<SKIP>>
## 29. Recursion in C
When a C function is called, arguments and variables defined in the called function
get stored in stack.
This figure shows the transision of state of main memory.
Each time the Fact function is called, memory for each function call is allocated.
In this figure, the value 4 and value Fact(3) are shown.
In this case, the value Fact(3) means the return value of the Fact(3).
That is, at this moment, two integer type variables are allocated on here,
and one value is 4 and the other value is not determined yet.
But the program knows the size of the return value,
the program can allocate the memory for the return value.
This example is often used to explain the concept of type and stack.
memory model of C.
## 36.
You can access the elements of the vectors by using square brackets.
When you try to access the position larger than the length of the vector, like this,
the size of the vector is automatically expanded, if is is needed.
this example shows the type of element must be the same among all the elements.
so, the type of the vector element is converted automatically in this example.
this example generates a logic vector.
Logic vectors can be used for filtering the element to remove portions not relevant to our use.
## 41. R list
R list is a vector of pointers that points other objects.
Therefore, R list can hold vectors of different types.
So,
this output means the first element of the list
refers to the vector object of 1,2,3
And so on.
## 42. w1[1]
You can access elements of a list by using square bracket and double square braket.
And these two syntax mean different things.
In brief, variable with the single square braket returns sublist.
on the other hand, when you use the double square braket, R returns a vector that the list point to.
* bracket
* square bracket
## 43.
You can get sublist by giving a vector in the bracket. => やってみる
In this case, new object is created as shown in this slide.
you can remove an object by calling remove() function,
This can be confirmed by this code.
after removing w1, of course w1 will become inaccessible.
But, even after the removal,
you can access w2, which was the sublist of the w1.
This is because when you access the sublist of w1,
A new object is created as shown in this slide.
You have become to be not able to access the w1 object.
## 45.
# 54.
Because the sample IDs are not convenient for the further analysis
Let’s replace the sample IDs to the tissue type descriptions
Provided in the phenotype data file.
you can assign the tissue name vector to the colnames() .
of course you can use attr() function for the same purpose.
this is what the result table looks like.
So, because there are so many parameters,
Please consult with the good textbook.
Paul Murrels book is one of the most famous, most comprehensive guide
To the R graphics.