genomation
package

genomation

Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

a toolkit to summarize, annotate and visualize genomic intervals

Using
genomation

Altuna Akalın1

More
information

February 24, 2014
1*

presented by. Package developed by Altuna Akalın and Vedran Franke
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1

Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1

2

Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1

2

More
information

Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)

3

Visualize genomic interval summaries as meta-region plots or
heatmaps.
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1

2

More
information

Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)

3

4

Visualize genomic interval summaries as meta-region plots or
heatmaps.
Work with multiple file formats
e.g. BAM, BED, bigWig, GFF and generic tabular text files
containing chromosome location information.
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1

2

More
information

Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)

3

4

Visualize genomic interval summaries as meta-region plots or
heatmaps.
Work with multiple file formats
e.g. BAM, BED, bigWig, GFF and generic tabular text files
containing chromosome location information.

5

do all these in R :)
Genomic interval summaries are widely used
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Summaries of genomic intervals are one of the useful ways to
communicate high-dimensional data
Traditionally, regions of interest are picked and distribution of
genomic intervals are summarized on those regions
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Figure : Erkek, S., et al. (2013). Molecular determinants of nucleosome
retention at CpG-rich sequences in mouse spermatozoa. Nature Structural
& Molecular Biology
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Figure : Stadler, M., Murr, R., Burger, L., et al. (2011). DNA-binding
factors shape the mouse methylome at distal regulatory regions. Nature
Utility and futility of average profiles
genomation
package

average profile around anchor

4.5

average score

More
information

5.0

Using
genomation

4.0

Usage and
ubiquity of
genomic
interval
summaries

Does this mean all of the windows (viewpoints) have a similar
enrichment profile?

3.5

Altuna Akalın

−100

0
bases

50

100
Utility and futility of average profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

Only 1/3 of windows have such enrichment. Be careful when you are
interpreting the average profiles.

21

Using
genomation

0

5.2

10

16

More
information

−100

0

50 100
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Figure : Lister, R., et al. (2009). Human DNA methylomes at base
resolution show widespread epigenomic differences. Nature
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Figure : Feng, S. et al. (2010). Conservation and divergence of
methylation patterning in plants and animals. PNAS
Issues to keep in mind when developing summary
methods
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Genomic data comes in many formats, we need a method that is
able to work with multiple flat file formats
We need a method that is not specialized on one type of data
set such as read counts, it should also work on other scoring
schemes(e.g. conservation scores) easily.
Regions of interest are not always equi-width, you should be able
to normalize for length differences by binning.
Multiple visualization options and fast heatmap generation
should be available
Clustering of regions based on multiple summaries (e.g.
binding for different TFs on the same set of regions) on the
heatmap
Ease of use, it should not take hours of coding to generate and
visualize summaries.
Overview of genomation features
genomation
package
Altuna Akalın

region 2
region 3
region 4
...
region m

1.1

1.0
0.8

0.86

TF4
TF3

0.6

0.6

TF2
0.34

TF1

0.072

.
.
.
.
. . . . . .
.

TF1

0

0.2

region 1

TF3
TF2

0.4

read per million

1 2 3 4 ... n

0

500

1000

base-pairs around anchor

heatmaps for genomic interval sets
TF 4

TF 3

TF 2

TF 1

Visualize

0 0.5 1 1.5 2 2.5

0
100

0

0

0

0 0.5 1 1.5 2 2.5

500

500

100

0

0
100

0

0 0.5 1 1.5 2

0

Annotate
500

Annotation
BED
GFF
Tab txt
GRanges

500

1000

base-pairs around anchor

500

More
information

Genomic Intervals
BAM
BigWig
BED
GFF
Summarize
Tab txt
GRanges

meta-region heatmaps

TF4

100

Using
genomation

meta-region plots

ScoreMatrix/ScoreMatrixList object
Base-pairs/ bins

0.0

Usage and
ubiquity of
genomic
interval
summaries

0 0.5 1 1.5 2 2.5

Piecharts for annotation

25.7

21.8
11.6

40.9

Intergenic
Intron
Exon
Promoter
installation of the package and the example data
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

We can install the package and the data using install_github()
function from the devtools package.
#install dependencies
install.packages( c("data.table","plyr","reshape2","ggplot2",
"gridBase","devtools"))
source("http://bioconductor.org/biocLite.R")
biocLite(c("GenomicRanges","rtracklayer","impute","Rsamtools"))
# install the packages
library(devtools)
install_github("genomation", username = "al2na")
# install the data package
# needed for examples
install_github("genomationData", username = "al2na")
Data import
genomation
package
Altuna Akalın

Various file formats can be used in genomation. You can read in
annotation or your genomic intervals of interest.

Usage and
ubiquity of
genomic
interval
summaries

library(genomation)
tab.file1 <- system.file("extdata/tab1.bed", package = "genomation")
readGeneric(tab.file1)

Using
genomation
More
information

## GRanges with 6
##
seqnames
##
<Rle>
##
[1]
chr21
##
[2]
chr21
##
[3]
chr21
##
[4]
chr21
##
[5]
chr21
##
[6]
chr21
##
--##
seqlengths:
##
chr21
##
NA

ranges and 0 metadata columns:
ranges strand
<IRanges> <Rle>
[9437272, 9439473]
*
[9483485, 9484663]
*
[9647866, 9648116]
*
[9708935, 9709231]
*
[9825442, 9826296]
*
[9909011, 9909218]
*
Extraction of data over pre-defined genomic regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

ScoreMatrix() and ScoreMatrixBin() are functions used to extract
data over predefined windows.
ScoreMatrix is used when all of the windows have the same
width (e.g. region around TSS)

Using
genomation

ScoreMatrixBin is designed for use with windows of unequal
width (e.g. enrichment of methylation over exons).

More
information

data(cage)
data(promoters)
sm <- ScoreMatrix(target = cage, windows = promoters)
sm
##

scoreMatrix with dims:

1055 2001
Visualizing ScoreMatrix: summary of genomic
invervals over pre-defined regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

plotMeta(),heatMeta(), heatMatrix() and multiHeatMatrix()
are the visualization functions.
oldmar <- par()$mar
par(oma = c(0, 0, 0, 0))
heatMatrix(sm, xcoords = c(-1000, 1000))
plotMeta(sm, xcoords = c(-1000, 1000),line.col="blue")
par(oma = oldmar)

0.15

average score

0.10
0.05
0.00

0

0.75

1.5

2.2

3

0.20

0.25

More
information

−1000

−500

0

500

1000

−1000

−500

0
bases

500

1000
Working with BAM files
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()
functions
bam.file = system.file('tests/test.bam', package='genomation')
windows = GRanges(rep(c(1,2),each=2),
IRanges(rep(c(1,2), times=2), width=5))
scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam')
Working with bigWig files

DHS around TSS

14

More
information

8 10

Using
genomation

my.bed12.file=system.file("extdata/chr21.refseq.hg19.bed",
package = "genomation")
feats=readTranscriptFeatures(my.bed12.file,up.flank=500,down.flank=500)
sm=ScoreMatrix(target="wgEncodeUwDnaseA549RawRep1.bw",
windows=feats$promoters,type='bigWig',strand.aware=TRUE)
plotMeta(sm,xcoords=c(-500,500),main="DHS around TSS",line.col="blue")

average score

Usage and
ubiquity of
genomic
interval
summaries

6

Altuna Akalın

ScoreMatrix() and ScoreMatrixBin() are functions can handle
bigWig files. Here we use ENCODE DHS scores, downloaded from
http://goo.gl/fEVu0g

4

genomation
package

−400

0

200
Multiple profiles

0

0

0

50

25

0
25

−2
50

50
−5 0
00

0

0

25

−2
50

50
−5 0
00

0

0

25

0

P300 Suz12 Rad21 Znf143

−2
50

0

0

−5
00

CTCF

50
−5 0
00

More
information

25

Using
genomation

ctcf.peaks=readRDS("ctcf.peaks.rds")
dataPath = system.file("extdata", package = "genomationData")
bam.files = list.files(dataPath, full= T,pattern = "bam$")[c(1:4,6)]
sml = ScoreMatrixList(bam.files, ctcf.peaks, bin.num = 50,type = "bam")
names(sml)=c("CTCF","P300","Suz12","Rad21","Znf143")
multiHeatMatrix(sml, xcoords = c(-500, 500),cex.axis=0.35,common.scale = T,
col = c("lightgray", "blue"),winsorize=c(0,95))

−2
50

Usage and
ubiquity of
genomic
interval
summaries

50
−5 0
00

Altuna Akalın

Multiple heatmap profiles can be plotted using multiHeatMatrix()
which takes in a ScoreMatrixList object. Here we used CTCF , P300
, Suz12 ,Rad21, Znf143 BAM files from genomationData package.

−2
50

genomation
package

02468 02468 02468 02468 02468
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

multiHeatMatrix() can also apply K-means clustering. Extreme
values are trimmed using with “winsorize” argument
multiHeatMatrix(sml, xcoords = c(-500, 500),kmeans=TRUE,k=3,common.scale = T,
cex.axis=0.4,col = c("lightgray", "blue"),winsorize=c(0,95))
CTCF

P300 Suz12 Rad21 Znf143

1

More
information
2

0

0

0

50

25

0

25
0
−500
50
0
−2
50

0

25
0
−500
50
0
−2
50

0

25
0
−500
50
0
−2
50

0

50

0
−500
50
0
−2
50

25

−2

−5

00

3

02468 02468 02468 02468 02468
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

Multiple average profiles can be visualized with heatMeta(). Here,
we also apply a scaling function to all the matrices.
# take log2 of all matrices
sml2=scaleScoreMatrixList(sml,scalefun=function(x) log2(x+1))
heatMeta(sml2,legend.name="average profiles",xcoords=c(-500, 500),
xlab="bp around peaks")

More
information

1.8

CTCF

average profiles
0.61
1
1.4

P300

Suz12

0.21

Rad21

Znf143

−400

−200

0
bp around peaks

200

400
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2,profile.names=names(sml2),
xcoords=c(-500, 500),
main="mult. profiles")

Using
genomation

mult. profiles

More
information

1.0
0.5

average score

1.5

CTCF
P300
Suz12
Rad21
Znf143

−400

−200

0
bases

200

400
Future work...
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

Explore overlap statistics between two genomic data sets: Does
TF1 binding site locations overlap with TF2 sites more than
expected?
This is previously explored with GenometriCorr package. These
functionality can be included in the form of a dependency.
Performance improvement on certain functions, faster is always
better...
Further information
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

The genomation package is available at
http:/al2na.github.io/genomation. You can find the link
to the vignette on the webpage as well.
Code that generated this presentation is available at
http://github.com/al2na/genomation_presentation
Questions and bug reports
You can view/open issues in github
https://github.com/al2na/genomation/issues?state=open
You can ask questions by sending an e-mail to
genomation@googlegroups.com or using the web interface to
google groups

Developed by Altuna Akalın and Vedran Franke
Session Info
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information

sessionInfo()
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] methods
grid
stats
[8] base

graphics

grDevices utils

other attached packages:
[1] genomation_0.99.0.2 knitr_1.5
loaded via a namespace (and not attached):
[1] BSgenome_1.30.0
BiocGenerics_0.8.0
[4] GenomicRanges_1.14.3 IRanges_1.20.5
[7] RColorBrewer_1.0-5
RCurl_1.95-4.1
[10] XML_3.95-0.2
XVector_0.2.0
[13] colorspace_1.2-4
data.table_1.8.10
[16] digest_0.6.3
evaluate_0.5.1
[19] ggplot2_0.9.3.1
gridBase_0.4-6

Biostrings_2.30.0
MASS_7.3-29
Rsamtools_1.14.1
bitops_1.0-6
dichromat_2.0-0
formatR_0.10
gtable_0.1.2

datas

genomation: summary of genomic intervals

  • 1.
    genomation package genomation Altuna Akalın Usage and ubiquityof genomic interval summaries a toolkit to summarize, annotate and visualize genomic intervals Using genomation Altuna Akalın1 More information February 24, 2014 1* presented by. Package developed by Altuna Akalın and Vedran Franke
  • 2.
    Quick introduction genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation More information The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters
  • 3.
    Quick introduction genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation More information The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions)
  • 4.
    Quick introduction genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 Visualize genomic interval summaries as meta-region plots or heatmaps.
  • 5.
    Quick introduction genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 4 Visualize genomic interval summaries as meta-region plots or heatmaps. Work with multiple file formats e.g. BAM, BED, bigWig, GFF and generic tabular text files containing chromosome location information.
  • 6.
    Quick introduction genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 4 Visualize genomic interval summaries as meta-region plots or heatmaps. Work with multiple file formats e.g. BAM, BED, bigWig, GFF and generic tabular text files containing chromosome location information. 5 do all these in R :)
  • 7.
    Genomic interval summariesare widely used genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Summaries of genomic intervals are one of the useful ways to communicate high-dimensional data Traditionally, regions of interest are picked and distribution of genomic intervals are summarized on those regions
  • 8.
    Genomic interval summariesare widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Erkek, S., et al. (2013). Molecular determinants of nucleosome retention at CpG-rich sequences in mouse spermatozoa. Nature Structural & Molecular Biology
  • 9.
    Genomic interval summariesare widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Stadler, M., Murr, R., Burger, L., et al. (2011). DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature
  • 10.
    Utility and futilityof average profiles genomation package average profile around anchor 4.5 average score More information 5.0 Using genomation 4.0 Usage and ubiquity of genomic interval summaries Does this mean all of the windows (viewpoints) have a similar enrichment profile? 3.5 Altuna Akalın −100 0 bases 50 100
  • 11.
    Utility and futilityof average profiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Only 1/3 of windows have such enrichment. Be careful when you are interpreting the average profiles. 21 Using genomation 0 5.2 10 16 More information −100 0 50 100
  • 12.
    Genomic interval summariesare widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Lister, R., et al. (2009). Human DNA methylomes at base resolution show widespread epigenomic differences. Nature
  • 13.
    Genomic interval summariesare widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Feng, S. et al. (2010). Conservation and divergence of methylation patterning in plants and animals. PNAS
  • 14.
    Issues to keepin mind when developing summary methods genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Genomic data comes in many formats, we need a method that is able to work with multiple flat file formats We need a method that is not specialized on one type of data set such as read counts, it should also work on other scoring schemes(e.g. conservation scores) easily. Regions of interest are not always equi-width, you should be able to normalize for length differences by binning. Multiple visualization options and fast heatmap generation should be available Clustering of regions based on multiple summaries (e.g. binding for different TFs on the same set of regions) on the heatmap Ease of use, it should not take hours of coding to generate and visualize summaries.
  • 15.
    Overview of genomationfeatures genomation package Altuna Akalın region 2 region 3 region 4 ... region m 1.1 1.0 0.8 0.86 TF4 TF3 0.6 0.6 TF2 0.34 TF1 0.072 . . . . . . . . . . . TF1 0 0.2 region 1 TF3 TF2 0.4 read per million 1 2 3 4 ... n 0 500 1000 base-pairs around anchor heatmaps for genomic interval sets TF 4 TF 3 TF 2 TF 1 Visualize 0 0.5 1 1.5 2 2.5 0 100 0 0 0 0 0.5 1 1.5 2 2.5 500 500 100 0 0 100 0 0 0.5 1 1.5 2 0 Annotate 500 Annotation BED GFF Tab txt GRanges 500 1000 base-pairs around anchor 500 More information Genomic Intervals BAM BigWig BED GFF Summarize Tab txt GRanges meta-region heatmaps TF4 100 Using genomation meta-region plots ScoreMatrix/ScoreMatrixList object Base-pairs/ bins 0.0 Usage and ubiquity of genomic interval summaries 0 0.5 1 1.5 2 2.5 Piecharts for annotation 25.7 21.8 11.6 40.9 Intergenic Intron Exon Promoter
  • 16.
    installation of thepackage and the example data genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information We can install the package and the data using install_github() function from the devtools package. #install dependencies install.packages( c("data.table","plyr","reshape2","ggplot2", "gridBase","devtools")) source("http://bioconductor.org/biocLite.R") biocLite(c("GenomicRanges","rtracklayer","impute","Rsamtools")) # install the packages library(devtools) install_github("genomation", username = "al2na") # install the data package # needed for examples install_github("genomationData", username = "al2na")
  • 17.
    Data import genomation package Altuna Akalın Variousfile formats can be used in genomation. You can read in annotation or your genomic intervals of interest. Usage and ubiquity of genomic interval summaries library(genomation) tab.file1 <- system.file("extdata/tab1.bed", package = "genomation") readGeneric(tab.file1) Using genomation More information ## GRanges with 6 ## seqnames ## <Rle> ## [1] chr21 ## [2] chr21 ## [3] chr21 ## [4] chr21 ## [5] chr21 ## [6] chr21 ## --## seqlengths: ## chr21 ## NA ranges and 0 metadata columns: ranges strand <IRanges> <Rle> [9437272, 9439473] * [9483485, 9484663] * [9647866, 9648116] * [9708935, 9709231] * [9825442, 9826296] * [9909011, 9909218] *
  • 18.
    Extraction of dataover pre-defined genomic regions genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries ScoreMatrix() and ScoreMatrixBin() are functions used to extract data over predefined windows. ScoreMatrix is used when all of the windows have the same width (e.g. region around TSS) Using genomation ScoreMatrixBin is designed for use with windows of unequal width (e.g. enrichment of methylation over exons). More information data(cage) data(promoters) sm <- ScoreMatrix(target = cage, windows = promoters) sm ## scoreMatrix with dims: 1055 2001
  • 19.
    Visualizing ScoreMatrix: summaryof genomic invervals over pre-defined regions genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation plotMeta(),heatMeta(), heatMatrix() and multiHeatMatrix() are the visualization functions. oldmar <- par()$mar par(oma = c(0, 0, 0, 0)) heatMatrix(sm, xcoords = c(-1000, 1000)) plotMeta(sm, xcoords = c(-1000, 1000),line.col="blue") par(oma = oldmar) 0.15 average score 0.10 0.05 0.00 0 0.75 1.5 2.2 3 0.20 0.25 More information −1000 −500 0 500 1000 −1000 −500 0 bases 500 1000
  • 20.
    Working with BAMfiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information BAM files can also be used in ScoreMatrix() and ScoreMatrixBin() functions bam.file = system.file('tests/test.bam', package='genomation') windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5)) scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam')
  • 21.
    Working with bigWigfiles DHS around TSS 14 More information 8 10 Using genomation my.bed12.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") feats=readTranscriptFeatures(my.bed12.file,up.flank=500,down.flank=500) sm=ScoreMatrix(target="wgEncodeUwDnaseA549RawRep1.bw", windows=feats$promoters,type='bigWig',strand.aware=TRUE) plotMeta(sm,xcoords=c(-500,500),main="DHS around TSS",line.col="blue") average score Usage and ubiquity of genomic interval summaries 6 Altuna Akalın ScoreMatrix() and ScoreMatrixBin() are functions can handle bigWig files. Here we use ENCODE DHS scores, downloaded from http://goo.gl/fEVu0g 4 genomation package −400 0 200
  • 22.
    Multiple profiles 0 0 0 50 25 0 25 −2 50 50 −5 0 00 0 0 25 −2 50 50 −50 00 0 0 25 0 P300 Suz12 Rad21 Znf143 −2 50 0 0 −5 00 CTCF 50 −5 0 00 More information 25 Using genomation ctcf.peaks=readRDS("ctcf.peaks.rds") dataPath = system.file("extdata", package = "genomationData") bam.files = list.files(dataPath, full= T,pattern = "bam$")[c(1:4,6)] sml = ScoreMatrixList(bam.files, ctcf.peaks, bin.num = 50,type = "bam") names(sml)=c("CTCF","P300","Suz12","Rad21","Znf143") multiHeatMatrix(sml, xcoords = c(-500, 500),cex.axis=0.35,common.scale = T, col = c("lightgray", "blue"),winsorize=c(0,95)) −2 50 Usage and ubiquity of genomic interval summaries 50 −5 0 00 Altuna Akalın Multiple heatmap profiles can be plotted using multiHeatMatrix() which takes in a ScoreMatrixList object. Here we used CTCF , P300 , Suz12 ,Rad21, Znf143 BAM files from genomationData package. −2 50 genomation package 02468 02468 02468 02468 02468
  • 23.
    Multiple profiles genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation multiHeatMatrix() can also apply K-means clustering. Extreme values are trimmed using with “winsorize” argument multiHeatMatrix(sml, xcoords = c(-500, 500),kmeans=TRUE,k=3,common.scale = T, cex.axis=0.4,col = c("lightgray", "blue"),winsorize=c(0,95)) CTCF P300 Suz12 Rad21 Znf143 1 More information 2 0 0 0 50 25 0 25 0 −500 50 0 −2 50 0 25 0 −500 50 0 −2 50 0 25 0 −500 50 0 −2 50 0 50 0 −500 50 0 −2 50 25 −2 −5 00 3 02468 02468 02468 02468 02468
  • 24.
    Multiple profiles genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation Multiple average profiles can be visualized with heatMeta(). Here, we also apply a scaling function to all the matrices. # take log2 of all matrices sml2=scaleScoreMatrixList(sml,scalefun=function(x) log2(x+1)) heatMeta(sml2,legend.name="average profiles",xcoords=c(-500, 500), xlab="bp around peaks") More information 1.8 CTCF average profiles 0.61 1 1.4 P300 Suz12 0.21 Rad21 Znf143 −400 −200 0 bp around peaks 200 400
  • 25.
    Multiple profiles genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Multiple average profiles can also be visualized with plotMeta() plotMeta(sml2,profile.names=names(sml2), xcoords=c(-500, 500), main="mult. profiles") Using genomation mult. profiles More information 1.0 0.5 average score 1.5 CTCF P300 Suz12 Rad21 Znf143 −400 −200 0 bases 200 400
  • 26.
    Future work... genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation More information Explore overlap statistics between two genomic data sets: Does TF1 binding site locations overlap with TF2 sites more than expected? This is previously explored with GenometriCorr package. These functionality can be included in the form of a dependency. Performance improvement on certain functions, faster is always better...
  • 27.
    Further information genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation More information The genomation package is available at http:/al2na.github.io/genomation. You can find the link to the vignette on the webpage as well. Code that generated this presentation is available at http://github.com/al2na/genomation_presentation Questions and bug reports You can view/open issues in github https://github.com/al2na/genomation/issues?state=open You can ask questions by sending an e-mail to genomation@googlegroups.com or using the web interface to google groups Developed by Altuna Akalın and Vedran Franke
  • 28.
    Session Info genomation package Altuna Akalın Usageand ubiquity of genomic interval summaries Using genomation More information sessionInfo() ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] methods grid stats [8] base graphics grDevices utils other attached packages: [1] genomation_0.99.0.2 knitr_1.5 loaded via a namespace (and not attached): [1] BSgenome_1.30.0 BiocGenerics_0.8.0 [4] GenomicRanges_1.14.3 IRanges_1.20.5 [7] RColorBrewer_1.0-5 RCurl_1.95-4.1 [10] XML_3.95-0.2 XVector_0.2.0 [13] colorspace_1.2-4 data.table_1.8.10 [16] digest_0.6.3 evaluate_0.5.1 [19] ggplot2_0.9.3.1 gridBase_0.4-6 Biostrings_2.30.0 MASS_7.3-29 Rsamtools_1.14.1 bitops_1.0-6 dichromat_2.0-0 formatR_0.10 gtable_0.1.2 datas