SlideShare a Scribd company logo
1 of 38
Introduction to R
     Basic Teaching module
 EMBL International PhD Program
           13-10-2010
Sander Timmer & Myrto Kostadima
Overview

What is R

Quick overview datatypes, input/output and
plots

Some biological examples

I’m not a particular good teacher, so please
ask when you’re lost!
What is this R thing?

R is a powerful, general purpose language
and software environment for statistical
computing and graphics

Runs on Linux, OS X and for the unlucky few
also on Windows

R is open source and free!
Start your R interface
Variables


x <- 2

x <- x^2

x

[1] 4
Vectors
Many ways of generating a vector with a range of numbers:

   x <- 1:10

   assign(“x”, 1:10)

   x <- c(1,2,3,4,5,6,7,8,9,10)

   x <- seq(1,10, by=1)

   x <- seq(length = 10, from=1,by=1)

x
[1] 1 2 3 4 5 6 7 8 9 10
Vectors

Common way to store multiple values

x <- c(1,2,4,5,10,12,15)

length(x)

mean(x)

summary(x)
Vectors

Vectors are indexed

x[5] + x[10]
[1] 15

x[-c(5,10)]
[1] 1 2 3 4 6 7 8 9
Matrices

Common form of storing 2 dimensional data

  Think about having an Excel sheet

m = matrix(1:10,2,5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   3    5    7    9
[2,] 2      4    6    8 10

summary(m)
Factors
Factors are vectors with a discrete number of
levels:

x <- factor(c(“Cancer”, “Cancer”, “Normal”,
“Normal”))

levels(x)
[1] “Cancer” “Normal”

table(x)
Cancer Normal
      2     2
Lists

A list can contain “anything”

Useful for storing several vectors

list(gene=”gene 1”, expression=c(5,2,3))
$gene
[1] “gene 1”
$expression
[1] 5, 2, 4
If-else statements

Essential for any programming language

if state then do x else do y

if(p < 0.01){
    print(“Significant gene”)
}else{
    print(“Insignificant gene”)
}
Repetition
You want to apply 1 function to every
element of a list

for(element in list){ ....do something.... }

For loops are easy though tend to be slow

Apply is the fast way of getting things done
in R:

apply(List,1,mean)
Data input


R has countless ways of importing data:

  CSV

  Excel

  Flat text file
Data input
Most simple, the CSV file:

  read.csv(“mydata.csv”,
  row.names=T,col.names=T)

Load a tab separated file

  read.table(“mytable.txt”, sep=”t”)

Load Rdata file

  load(“mydata.Rdata”)
Data input
Also for more specific data sources:

Excel

Database connections

            Mysql -> Ensembl e.g.

Affy

       Affymetrix chips data

HapMap

.........
Data output
Most simple, the CSV file:

  write.csv(x, file=”myx.csv”)

Save Rdata file:

  save(x, file=”myx.Rdata”)

Save whole R session:

  save(file=”mysession.Rdata”)
Graphics


Quick way to study your data is plotting it

The function “plot” in R can plot almost
anything out of the box (even if this doesn’t
make sense!)
plot(1:5,5:1)
plot(1:5,5:1, col=”red”, type=”l”)
plot(1:5,5:1, col=”red”, type=”l”,
    main="Title of this plot",
  xlab="x axis", ylab="y axis")
Basic graphics

With R you can plot almost any object

  Multidimensional variables like matrixes
  can be plotted with matplot()

Other often used plot functions are:

  boxplot(), hist(), levelplot(), heatmap()
Advanced plotting
Advanced plotting
Advanced plotting
Before the example
Help page for functions in R can be called:

  ?plot, ?hist, ?vector

Examples for most functions can be runned:

  example(plot)

Text search for functions can be done by
performing:

  ??plot
Example

Some example Affymetrix dataset to play
with

  Checking distribution of data

  Plotting data

  Clustering data

  Correlate data
Read file


library(affy)

library(affydata)

data(Dilution)

print(Dilution)
Read file


dil = pm(Dilution)[1:2000,]

dil.ex = exprs(Dilution)[1:2000,]

rownames(dil.ex) =
row.names(probes(Dilution))[1:2000]
Summary
Checking what we got

summary(dil)

mva.pairs(dil)

Or:

boxplot(log(dil.ex))

Or:

hist(dil.ex, xlim=c(0,500), breaks=1000)
We need to normalise
       first
For almost all experiments you have to apply
some sort of normalisation

dil.norm = maffy.normalize(dil,
subset=1:nrow(dil))

colnames(dil.norm) = colnames(dil)

mva.pairs(dil.norm)
Most equal samples

Applying euclidian distance to detect most
equal samples

dil.norm.dist = dist(t(dil.norm))

dil.norm.dist.hc = hclust(dil.norm.dist)

plot(dil.norm.dist.hc)

Do the same for the non normalised dataset
Checking expression

Heatmap representation of expression levels
for different probes

heatmap(dil.ex.norm[1:50,])

You could apply a T-test for example to rank
to only plot the most significant probes
Checking expression

Heatmap representation of expression levels
for different probes

heatmap(dil.ex.norm[1:50,])

You could apply a T-test for example to rank
to only plot the most significant probes
Checking expression
You could apply a T-test for example to rank
to only plot the most significant probes

library(genefilter)

f = factor(c(1,1,2,2))

dil.exp.norm.t = rowttests(dil.exp.norm, fac=f)

heatmap(dil.exp.norm[order(dil.exp.norm.t
$dm)[1:10],])
Want to know more?
Using R will benefit all PhD’s in this room

Learning by doing

Loads of basic examples at:

  http://addictedtor.free.fr/graphiques/

  http://www.mayin.org/ajayshah/KB/R/
  index.html

  http://www.r-project.org/
Just keep in mind......
Questions?


Contact me:

swtimmer@ebi.ac.uk

http://www.ebi.ac.uk/~swtimmer/ for slides
or http://www.slideshare.net/swtimmer

More Related Content

What's hot

RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
R basics
R basicsR basics
R basicsFAO
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbaiUnmesh Baile
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 

What's hot (20)

Programming in R
Programming in RProgramming in R
Programming in R
 
R programming language
R programming languageR programming language
R programming language
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
R language introduction
R language introductionR language introduction
R language introduction
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
R basics
R basicsR basics
R basics
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 

Viewers also liked

Viewers also liked (20)

R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
R presentation
R presentationR presentation
R presentation
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
Moving Data to and From R
Moving Data to and From RMoving Data to and From R
Moving Data to and From R
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
R Introduction
R IntroductionR Introduction
R Introduction
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
R tutorial
R tutorialR tutorial
R tutorial
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 

Similar to Presentation R basic teaching module

Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language Md. Mahedi Mahfuj
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programaciónSoftware Guru
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementDr. Volkan OBAN
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference CardMaurice Dawson
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePeter Solymos
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.Dr. Volkan OBAN
 

Similar to Presentation R basic teaching module (20)

R workshop
R workshopR workshop
R workshop
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programación
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
R교육1
R교육1R교육1
R교육1
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
R Basics
R BasicsR Basics
R Basics
 
R Basics
R BasicsR Basics
R Basics
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
Lrz kurse: r visualisation
Lrz kurse: r visualisationLrz kurse: r visualisation
Lrz kurse: r visualisation
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Presentation R basic teaching module

  • 1. Introduction to R Basic Teaching module EMBL International PhD Program 13-10-2010 Sander Timmer & Myrto Kostadima
  • 2. Overview What is R Quick overview datatypes, input/output and plots Some biological examples I’m not a particular good teacher, so please ask when you’re lost!
  • 3. What is this R thing? R is a powerful, general purpose language and software environment for statistical computing and graphics Runs on Linux, OS X and for the unlucky few also on Windows R is open source and free!
  • 4. Start your R interface
  • 5. Variables x <- 2 x <- x^2 x [1] 4
  • 6. Vectors Many ways of generating a vector with a range of numbers: x <- 1:10 assign(“x”, 1:10) x <- c(1,2,3,4,5,6,7,8,9,10) x <- seq(1,10, by=1) x <- seq(length = 10, from=1,by=1) x [1] 1 2 3 4 5 6 7 8 9 10
  • 7. Vectors Common way to store multiple values x <- c(1,2,4,5,10,12,15) length(x) mean(x) summary(x)
  • 8. Vectors Vectors are indexed x[5] + x[10] [1] 15 x[-c(5,10)] [1] 1 2 3 4 6 7 8 9
  • 9. Matrices Common form of storing 2 dimensional data Think about having an Excel sheet m = matrix(1:10,2,5) [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10 summary(m)
  • 10. Factors Factors are vectors with a discrete number of levels: x <- factor(c(“Cancer”, “Cancer”, “Normal”, “Normal”)) levels(x) [1] “Cancer” “Normal” table(x) Cancer Normal 2 2
  • 11. Lists A list can contain “anything” Useful for storing several vectors list(gene=”gene 1”, expression=c(5,2,3)) $gene [1] “gene 1” $expression [1] 5, 2, 4
  • 12. If-else statements Essential for any programming language if state then do x else do y if(p < 0.01){ print(“Significant gene”) }else{ print(“Insignificant gene”) }
  • 13. Repetition You want to apply 1 function to every element of a list for(element in list){ ....do something.... } For loops are easy though tend to be slow Apply is the fast way of getting things done in R: apply(List,1,mean)
  • 14. Data input R has countless ways of importing data: CSV Excel Flat text file
  • 15. Data input Most simple, the CSV file: read.csv(“mydata.csv”, row.names=T,col.names=T) Load a tab separated file read.table(“mytable.txt”, sep=”t”) Load Rdata file load(“mydata.Rdata”)
  • 16. Data input Also for more specific data sources: Excel Database connections Mysql -> Ensembl e.g. Affy Affymetrix chips data HapMap .........
  • 17. Data output Most simple, the CSV file: write.csv(x, file=”myx.csv”) Save Rdata file: save(x, file=”myx.Rdata”) Save whole R session: save(file=”mysession.Rdata”)
  • 18. Graphics Quick way to study your data is plotting it The function “plot” in R can plot almost anything out of the box (even if this doesn’t make sense!)
  • 21. plot(1:5,5:1, col=”red”, type=”l”, main="Title of this plot", xlab="x axis", ylab="y axis")
  • 22. Basic graphics With R you can plot almost any object Multidimensional variables like matrixes can be plotted with matplot() Other often used plot functions are: boxplot(), hist(), levelplot(), heatmap()
  • 26. Before the example Help page for functions in R can be called: ?plot, ?hist, ?vector Examples for most functions can be runned: example(plot) Text search for functions can be done by performing: ??plot
  • 27. Example Some example Affymetrix dataset to play with Checking distribution of data Plotting data Clustering data Correlate data
  • 29. Read file dil = pm(Dilution)[1:2000,] dil.ex = exprs(Dilution)[1:2000,] rownames(dil.ex) = row.names(probes(Dilution))[1:2000]
  • 30. Summary Checking what we got summary(dil) mva.pairs(dil) Or: boxplot(log(dil.ex)) Or: hist(dil.ex, xlim=c(0,500), breaks=1000)
  • 31. We need to normalise first For almost all experiments you have to apply some sort of normalisation dil.norm = maffy.normalize(dil, subset=1:nrow(dil)) colnames(dil.norm) = colnames(dil) mva.pairs(dil.norm)
  • 32. Most equal samples Applying euclidian distance to detect most equal samples dil.norm.dist = dist(t(dil.norm)) dil.norm.dist.hc = hclust(dil.norm.dist) plot(dil.norm.dist.hc) Do the same for the non normalised dataset
  • 33. Checking expression Heatmap representation of expression levels for different probes heatmap(dil.ex.norm[1:50,]) You could apply a T-test for example to rank to only plot the most significant probes
  • 34. Checking expression Heatmap representation of expression levels for different probes heatmap(dil.ex.norm[1:50,]) You could apply a T-test for example to rank to only plot the most significant probes
  • 35. Checking expression You could apply a T-test for example to rank to only plot the most significant probes library(genefilter) f = factor(c(1,1,2,2)) dil.exp.norm.t = rowttests(dil.exp.norm, fac=f) heatmap(dil.exp.norm[order(dil.exp.norm.t $dm)[1:10],])
  • 36. Want to know more? Using R will benefit all PhD’s in this room Learning by doing Loads of basic examples at: http://addictedtor.free.fr/graphiques/ http://www.mayin.org/ajayshah/KB/R/ index.html http://www.r-project.org/
  • 37. Just keep in mind......