genomation
package

genomation

Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

a toolkit to summarize, an...
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More...
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More...
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The...
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The...
Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

The...
Genomic interval summaries are widely used
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summari...
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity o...
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity o...
Utility and futility of average profiles
genomation
package

average profile around anchor

4.5

average score

More
inform...
Utility and futility of average profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
...
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity o...
Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity o...
Issues to keep in mind when developing summary
methods
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
inte...
Overview of genomation features
genomation
package
Altuna Akalın

region 2
region 3
region 4
...
region m

1.1

1.0
0.8

0...
installation of the package and the example data
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
s...
Data import
genomation
package
Altuna Akalın

Various file formats can be used in genomation. You can read in
annotation or...
Extraction of data over pre-defined genomic regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval...
Visualizing ScoreMatrix: summary of genomic
invervals over pre-defined regions
genomation
package
Altuna Akalın
Usage and
u...
Working with BAM files
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
M...
Working with bigWig files

DHS around TSS

14

More
information

8 10

Using
genomation

my.bed12.file=system.file("extdata...
Multiple profiles

0

0

0

50

25

0
25

−2
50

50
−5 0
00

0

0

25

−2
50

50
−5 0
00

0

0

25

0

P300 Suz12 Rad21 Znf...
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

multi...
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation

Multi...
Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries

Multiple average profil...
Future work...
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
inf...
Further information
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
Mor...
Session Info
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
infor...
Upcoming SlideShare
Loading in …5
×

genomation: summary of genomic intervals

8,719 views

Published on

an R package that contains a collection of tools for visualizing and analyzing genome-wide data sets. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations. http://al2na.github.io/genomation/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,719
On SlideShare
0
From Embeds
0
Number of Embeds
5,844
Actions
Shares
0
Downloads
29
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

genomation: summary of genomic intervals

  1. 1. genomation package genomation Altuna Akalın Usage and ubiquity of genomic interval summaries a toolkit to summarize, annotate and visualize genomic intervals Using genomation Altuna Akalın1 More information February 24, 2014 1* presented by. Package developed by Altuna Akalın and Vedran Franke
  2. 2. Quick introduction genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters
  3. 3. Quick introduction genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions)
  4. 4. Quick introduction genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 Visualize genomic interval summaries as meta-region plots or heatmaps.
  5. 5. Quick introduction genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 4 Visualize genomic interval summaries as meta-region plots or heatmaps. Work with multiple file formats e.g. BAM, BED, bigWig, GFF and generic tabular text files containing chromosome location information.
  6. 6. Quick introduction genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation The genomation is an R package that expedites genomic interval summary and annotation. It has the following features 1 2 More information Annotation of genomic intervals: e.g. see what % of your intervals overlap with exon/intron/promoters Summary of genomic scores or read coverages over pre-defined regions e.g. extract the conservation profile over ChIP-seq binding sites (equi-width regions) or CpG islands (nonequi-width regions) 3 4 Visualize genomic interval summaries as meta-region plots or heatmaps. Work with multiple file formats e.g. BAM, BED, bigWig, GFF and generic tabular text files containing chromosome location information. 5 do all these in R :)
  7. 7. Genomic interval summaries are widely used genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Summaries of genomic intervals are one of the useful ways to communicate high-dimensional data Traditionally, regions of interest are picked and distribution of genomic intervals are summarized on those regions
  8. 8. Genomic interval summaries are widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Erkek, S., et al. (2013). Molecular determinants of nucleosome retention at CpG-rich sequences in mouse spermatozoa. Nature Structural & Molecular Biology
  9. 9. Genomic interval summaries are widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Stadler, M., Murr, R., Burger, L., et al. (2011). DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature
  10. 10. Utility and futility of average profiles genomation package average profile around anchor 4.5 average score More information 5.0 Using genomation 4.0 Usage and ubiquity of genomic interval summaries Does this mean all of the windows (viewpoints) have a similar enrichment profile? 3.5 Altuna Akalın −100 0 bases 50 100
  11. 11. Utility and futility of average profiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Only 1/3 of windows have such enrichment. Be careful when you are interpreting the average profiles. 21 Using genomation 0 5.2 10 16 More information −100 0 50 100
  12. 12. Genomic interval summaries are widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Lister, R., et al. (2009). Human DNA methylomes at base resolution show widespread epigenomic differences. Nature
  13. 13. Genomic interval summaries are widely used: Examples from literature genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Figure : Feng, S. et al. (2010). Conservation and divergence of methylation patterning in plants and animals. PNAS
  14. 14. Issues to keep in mind when developing summary methods genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Genomic data comes in many formats, we need a method that is able to work with multiple flat file formats We need a method that is not specialized on one type of data set such as read counts, it should also work on other scoring schemes(e.g. conservation scores) easily. Regions of interest are not always equi-width, you should be able to normalize for length differences by binning. Multiple visualization options and fast heatmap generation should be available Clustering of regions based on multiple summaries (e.g. binding for different TFs on the same set of regions) on the heatmap Ease of use, it should not take hours of coding to generate and visualize summaries.
  15. 15. Overview of genomation features genomation package Altuna Akalın region 2 region 3 region 4 ... region m 1.1 1.0 0.8 0.86 TF4 TF3 0.6 0.6 TF2 0.34 TF1 0.072 . . . . . . . . . . . TF1 0 0.2 region 1 TF3 TF2 0.4 read per million 1 2 3 4 ... n 0 500 1000 base-pairs around anchor heatmaps for genomic interval sets TF 4 TF 3 TF 2 TF 1 Visualize 0 0.5 1 1.5 2 2.5 0 100 0 0 0 0 0.5 1 1.5 2 2.5 500 500 100 0 0 100 0 0 0.5 1 1.5 2 0 Annotate 500 Annotation BED GFF Tab txt GRanges 500 1000 base-pairs around anchor 500 More information Genomic Intervals BAM BigWig BED GFF Summarize Tab txt GRanges meta-region heatmaps TF4 100 Using genomation meta-region plots ScoreMatrix/ScoreMatrixList object Base-pairs/ bins 0.0 Usage and ubiquity of genomic interval summaries 0 0.5 1 1.5 2 2.5 Piecharts for annotation 25.7 21.8 11.6 40.9 Intergenic Intron Exon Promoter
  16. 16. installation of the package and the example data genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information We can install the package and the data using install_github() function from the devtools package. #install dependencies install.packages( c("data.table","plyr","reshape2","ggplot2", "gridBase","devtools")) source("http://bioconductor.org/biocLite.R") biocLite(c("GenomicRanges","rtracklayer","impute","Rsamtools")) # install the packages library(devtools) install_github("genomation", username = "al2na") # install the data package # needed for examples install_github("genomationData", username = "al2na")
  17. 17. Data import genomation package Altuna Akalın Various file formats can be used in genomation. You can read in annotation or your genomic intervals of interest. Usage and ubiquity of genomic interval summaries library(genomation) tab.file1 <- system.file("extdata/tab1.bed", package = "genomation") readGeneric(tab.file1) Using genomation More information ## GRanges with 6 ## seqnames ## <Rle> ## [1] chr21 ## [2] chr21 ## [3] chr21 ## [4] chr21 ## [5] chr21 ## [6] chr21 ## --## seqlengths: ## chr21 ## NA ranges and 0 metadata columns: ranges strand <IRanges> <Rle> [9437272, 9439473] * [9483485, 9484663] * [9647866, 9648116] * [9708935, 9709231] * [9825442, 9826296] * [9909011, 9909218] *
  18. 18. Extraction of data over pre-defined genomic regions genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries ScoreMatrix() and ScoreMatrixBin() are functions used to extract data over predefined windows. ScoreMatrix is used when all of the windows have the same width (e.g. region around TSS) Using genomation ScoreMatrixBin is designed for use with windows of unequal width (e.g. enrichment of methylation over exons). More information data(cage) data(promoters) sm <- ScoreMatrix(target = cage, windows = promoters) sm ## scoreMatrix with dims: 1055 2001
  19. 19. Visualizing ScoreMatrix: summary of genomic invervals over pre-defined regions genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation plotMeta(),heatMeta(), heatMatrix() and multiHeatMatrix() are the visualization functions. oldmar <- par()$mar par(oma = c(0, 0, 0, 0)) heatMatrix(sm, xcoords = c(-1000, 1000)) plotMeta(sm, xcoords = c(-1000, 1000),line.col="blue") par(oma = oldmar) 0.15 average score 0.10 0.05 0.00 0 0.75 1.5 2.2 3 0.20 0.25 More information −1000 −500 0 500 1000 −1000 −500 0 bases 500 1000
  20. 20. Working with BAM files genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information BAM files can also be used in ScoreMatrix() and ScoreMatrixBin() functions bam.file = system.file('tests/test.bam', package='genomation') windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5)) scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam')
  21. 21. Working with bigWig files DHS around TSS 14 More information 8 10 Using genomation my.bed12.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") feats=readTranscriptFeatures(my.bed12.file,up.flank=500,down.flank=500) sm=ScoreMatrix(target="wgEncodeUwDnaseA549RawRep1.bw", windows=feats$promoters,type='bigWig',strand.aware=TRUE) plotMeta(sm,xcoords=c(-500,500),main="DHS around TSS",line.col="blue") average score Usage and ubiquity of genomic interval summaries 6 Altuna Akalın ScoreMatrix() and ScoreMatrixBin() are functions can handle bigWig files. Here we use ENCODE DHS scores, downloaded from http://goo.gl/fEVu0g 4 genomation package −400 0 200
  22. 22. Multiple profiles 0 0 0 50 25 0 25 −2 50 50 −5 0 00 0 0 25 −2 50 50 −5 0 00 0 0 25 0 P300 Suz12 Rad21 Znf143 −2 50 0 0 −5 00 CTCF 50 −5 0 00 More information 25 Using genomation ctcf.peaks=readRDS("ctcf.peaks.rds") dataPath = system.file("extdata", package = "genomationData") bam.files = list.files(dataPath, full= T,pattern = "bam$")[c(1:4,6)] sml = ScoreMatrixList(bam.files, ctcf.peaks, bin.num = 50,type = "bam") names(sml)=c("CTCF","P300","Suz12","Rad21","Znf143") multiHeatMatrix(sml, xcoords = c(-500, 500),cex.axis=0.35,common.scale = T, col = c("lightgray", "blue"),winsorize=c(0,95)) −2 50 Usage and ubiquity of genomic interval summaries 50 −5 0 00 Altuna Akalın Multiple heatmap profiles can be plotted using multiHeatMatrix() which takes in a ScoreMatrixList object. Here we used CTCF , P300 , Suz12 ,Rad21, Znf143 BAM files from genomationData package. −2 50 genomation package 02468 02468 02468 02468 02468
  23. 23. Multiple profiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation multiHeatMatrix() can also apply K-means clustering. Extreme values are trimmed using with “winsorize” argument multiHeatMatrix(sml, xcoords = c(-500, 500),kmeans=TRUE,k=3,common.scale = T, cex.axis=0.4,col = c("lightgray", "blue"),winsorize=c(0,95)) CTCF P300 Suz12 Rad21 Znf143 1 More information 2 0 0 0 50 25 0 25 0 −500 50 0 −2 50 0 25 0 −500 50 0 −2 50 0 25 0 −500 50 0 −2 50 0 50 0 −500 50 0 −2 50 25 −2 −5 00 3 02468 02468 02468 02468 02468
  24. 24. Multiple profiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation Multiple average profiles can be visualized with heatMeta(). Here, we also apply a scaling function to all the matrices. # take log2 of all matrices sml2=scaleScoreMatrixList(sml,scalefun=function(x) log2(x+1)) heatMeta(sml2,legend.name="average profiles",xcoords=c(-500, 500), xlab="bp around peaks") More information 1.8 CTCF average profiles 0.61 1 1.4 P300 Suz12 0.21 Rad21 Znf143 −400 −200 0 bp around peaks 200 400
  25. 25. Multiple profiles genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Multiple average profiles can also be visualized with plotMeta() plotMeta(sml2,profile.names=names(sml2), xcoords=c(-500, 500), main="mult. profiles") Using genomation mult. profiles More information 1.0 0.5 average score 1.5 CTCF P300 Suz12 Rad21 Znf143 −400 −200 0 bases 200 400
  26. 26. Future work... genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information Explore overlap statistics between two genomic data sets: Does TF1 binding site locations overlap with TF2 sites more than expected? This is previously explored with GenometriCorr package. These functionality can be included in the form of a dependency. Performance improvement on certain functions, faster is always better...
  27. 27. Further information genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information The genomation package is available at http:/al2na.github.io/genomation. You can find the link to the vignette on the webpage as well. Code that generated this presentation is available at http://github.com/al2na/genomation_presentation Questions and bug reports You can view/open issues in github https://github.com/al2na/genomation/issues?state=open You can ask questions by sending an e-mail to genomation@googlegroups.com or using the web interface to google groups Developed by Altuna Akalın and Vedran Franke
  28. 28. Session Info genomation package Altuna Akalın Usage and ubiquity of genomic interval summaries Using genomation More information sessionInfo() ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] methods grid stats [8] base graphics grDevices utils other attached packages: [1] genomation_0.99.0.2 knitr_1.5 loaded via a namespace (and not attached): [1] BSgenome_1.30.0 BiocGenerics_0.8.0 [4] GenomicRanges_1.14.3 IRanges_1.20.5 [7] RColorBrewer_1.0-5 RCurl_1.95-4.1 [10] XML_3.95-0.2 XVector_0.2.0 [13] colorspace_1.2-4 data.table_1.8.10 [16] digest_0.6.3 evaluate_0.5.1 [19] ggplot2_0.9.3.1 gridBase_0.4-6 Biostrings_2.30.0 MASS_7.3-29 Rsamtools_1.14.1 bitops_1.0-6 dichromat_2.0-0 formatR_0.10 gtable_0.1.2 datas

×