Summarize, annotate and visualize genomic intervals 
with R/BioC package genomation
	
Genomic	intervals		
Genomic	intervals	are	the	basis	of	genome	
annota3on.	Intervals	can	contain	simply	
loca3on	informa3on	(TFBS	loca3ons)	or	can	
contain	a	variety	of	scores,	with	different	
scales,	such	as	percent	methyla3on,	ChIP-seq	
enrichment	or	read	coverage.		
Windows/regions	of	interest	
over	the	genome	
Summary	of	genomic	intervals	over	different	
sub-genomic	contexts	or	windows	of	interest	
is	the	necessary	first	step	in	making	inference	
about	the	biological	importance	of	the	data.	
The	can	be	of	equal	length	(Ex:	pre-defined	
regions	around	TSS)	or	not	(Ex:	exons,	
transcript	or	CpG	islands)		
Extract	genomic	intervals	for	
windows	of	interest	
Genomic	intervals	could	be	extracted	for	
windows	of	interest	and	stored	in	matrix	
format.	If	the	windows	are	not	of	equal	
length,	then	binning	strategies	can	be	used	
to	create	equal	number	of	bins	over	those	
windows	and	s3ll	use	a	matrix	to	store	
informa3on.	
Visualize	the	summary	of	
genomic	intervals	
The	matrices	that	described	above	can	be	
visualized	using	heatmaps	or	meta-region	
plots.	Meta-region	plots		show	the	average	
value	of	the	signal	from	genomic	intervals	
over	the	windows	of	interest,	this	could	be	a	
line	plot	or	a	heatmap	where	colors	indicate	
average	values.	Another	type	heatmap		could	
be	used	to	show	the	values	for	matrices	that	
contain	the	signal	value	for	genomic	intervals	
over	the	windows.	In	these	heatmaps,	every	
row	represents	a	window	and	every	column	
is	a	base-posi3on	or	a	bin.	
Read	genomic	intervals	into	R		
Genomic	intervals	of	any	kind	can	be	red	into	
R	using	the	func3ons	below,	all	which	return	
GRanges		or	GRangesList	objects:	
readGeneric(file,…)can	read	generic	
text	files	with	genomic	interval	informa3on.	
readBed(file,...)can	read	BED	files.	
readTranscriptFeatures(file,…) can	
read	BED12	files	with	exon/intron	structure.	
gffToGRanges can	read	a	GFF	file.
readBroadPeak,readNarrowPeak,
readFeatureFlank	are	other	convenience	
func3ons	to	read	BED-like	files.	
Extract	genomic	intervals	for	
windows	of	interest	in	R	
Genomic	intervals	could	be	extracted	for	
windows	of	interest	using	
ScoreMatrix(target,windows) &	
ScoreMatrixBin(target,windows)
func3ons.	These	func3ons	can	handle	BAM,	
BigWig	files	and	GRanges	objects	as	inputs.		
	
patternMatrix() returns	the	rela3ve	
loca3ons	or	scores	of	k-mers	or	mo3fs,	useful	
for	analyzing	ChIP-seq.		
	
All	of	these	return	ScoreMatrix	or	
ScoreMatrixList	objects.	These	objects	can	
be	manipulated	further	using orderBy,
binMatrix, scaleScoreMatrix,
scaleScoreMatrixList,
intersectScoreMatrixList and []
func3ons.	
Visualize	the	summary	of	
genomic	intervals	in	R	
heatMatrix(scoreMatrixobj) and	
multiHeatMatrix(scoreMatrixListObj
) makes	the	heatmaps	for	windows	of	
interest.		
plotMeta()	makes	meta-region	plots	and	
heatMeta() makes	meta-region	heatmaps.	
These	func3ons	return	values	invisibly	see	
respec3ve	help	pages	for	func3ons.	Intergenic
Intron
Exon
Promoter40.9
11.6
21.825.7
0 500 1000
0.00.20.40.60.81.0
base-pairs around anchor
readpermillion
TF4
TF3
TF2
TF1
0
500
1000
0 0.5 1 1.5 2
TF 4
0
500
1000
0 0.5 1 1.5 2 2.5
TF 3
0
500
1000
0 0.5 1 1.5 2 2.5
TF 2
0
500
1000
0 0.5 1 1.5 2 2.5
TF 1
0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
0.0720.340.60.861.1
meta-region plots meta-region heatmaps heatmaps for genomic interval sets
Piecharts for annotation
Annota3on	for	genomic	
intervals	in	R	
Annota3on	summaries	for	target	genomic	
intervals	can	be	obtained	by	
annotateWithFeatures().	The	resul3ng	
object	can	be	visualized	by	
plotTargetAnnotation() for	piechart	
and	heatTargetAnnotation() for	a	
heatmap	of	annota3on	overlapping	
percentages.	
Contributors:		Altuna	Akalin	[aut,	cre],	Vedran	Franke	[aut,	cre],	Katarzyna	Wreczycka	[aut],			
Alexander	Gosdschan	[ctb],	Liz	Ing-Simmons	[ctb]	
CitaCon:	Akalin	A,	Franke	V,	Vlahovicek	K,	Mason	CE,	Schubeler	D.	(2015).	Bioinforma3cs.	Doi:
10.1093/bioinforma3cs/btu775	
Genomic	intervals	with	different	kinds	of	informa3on	
Extract	subset	of		genomic	intervals		for	windows	of	interest	
Meta-region	line	plots	
Meta-region	heatmap	
heatmaps	for	genomic	intervals	
Annota3on	pie	charts	 Annota3on	heatmaps	
Annota3on	for	genomic	
intervals	
Genomic	intervals	such	as	ChIP-seq	peaks	or	
differen3ally	methylated	regions	needs	to	be	
annotated	further	with	known	genomic	
annota3ons	such	as	promoter,	intron,	exon	
structures	of	the	genes.	This	gives	addi3onal	
informa3on	for	func3onal	characteriza3on	of	
the	genomic	intervals.	This	usually	
represented	as	percent	of	genomic	features	
overlapping	with	the	annota3on.	
urls:	hap://bioinforma3cs.mdc-berlin.de/genoma3on			
										hap://www.biconductor.org/release/genoma3on		
Install	genomaCon: OR	
	
devtools::install_github(”BIMSBbioinfo/genomation”, build_vig=F)
source("https://bioconductor.org/biocLite.R") ; biocLite(“genomation”)
Visualize	summaries	and	annota3on	for	genomic	intervals	
Misc.	func3ons	
getRandomEnrichment() calculates	the	significance	of	
associa3on	between	two	sets	of	genomics	intervals.	
Valid	for	package	version	=>	1.5.6

Poster/cheatsheet for R/BioC package genomation [Download for a better resolution]

  • 1.
    Summarize, annotate andvisualize genomic intervals with R/BioC package genomation Genomic intervals Genomic intervals are the basis of genome annota3on. Intervals can contain simply loca3on informa3on (TFBS loca3ons) or can contain a variety of scores, with different scales, such as percent methyla3on, ChIP-seq enrichment or read coverage. Windows/regions of interest over the genome Summary of genomic intervals over different sub-genomic contexts or windows of interest is the necessary first step in making inference about the biological importance of the data. The can be of equal length (Ex: pre-defined regions around TSS) or not (Ex: exons, transcript or CpG islands) Extract genomic intervals for windows of interest Genomic intervals could be extracted for windows of interest and stored in matrix format. If the windows are not of equal length, then binning strategies can be used to create equal number of bins over those windows and s3ll use a matrix to store informa3on. Visualize the summary of genomic intervals The matrices that described above can be visualized using heatmaps or meta-region plots. Meta-region plots show the average value of the signal from genomic intervals over the windows of interest, this could be a line plot or a heatmap where colors indicate average values. Another type heatmap could be used to show the values for matrices that contain the signal value for genomic intervals over the windows. In these heatmaps, every row represents a window and every column is a base-posi3on or a bin. Read genomic intervals into R Genomic intervals of any kind can be red into R using the func3ons below, all which return GRanges or GRangesList objects: readGeneric(file,…)can read generic text files with genomic interval informa3on. readBed(file,...)can read BED files. readTranscriptFeatures(file,…) can read BED12 files with exon/intron structure. gffToGRanges can read a GFF file. readBroadPeak,readNarrowPeak, readFeatureFlank are other convenience func3ons to read BED-like files. Extract genomic intervals for windows of interest in R Genomic intervals could be extracted for windows of interest using ScoreMatrix(target,windows) & ScoreMatrixBin(target,windows) func3ons. These func3ons can handle BAM, BigWig files and GRanges objects as inputs. patternMatrix() returns the rela3ve loca3ons or scores of k-mers or mo3fs, useful for analyzing ChIP-seq. All of these return ScoreMatrix or ScoreMatrixList objects. These objects can be manipulated further using orderBy, binMatrix, scaleScoreMatrix, scaleScoreMatrixList, intersectScoreMatrixList and [] func3ons. Visualize the summary of genomic intervals in R heatMatrix(scoreMatrixobj) and multiHeatMatrix(scoreMatrixListObj ) makes the heatmaps for windows of interest. plotMeta() makes meta-region plots and heatMeta() makes meta-region heatmaps. These func3ons return values invisibly see respec3ve help pages for func3ons. Intergenic Intron Exon Promoter40.9 11.6 21.825.7 0 500 1000 0.00.20.40.60.81.0 base-pairs around anchor readpermillion TF4 TF3 TF2 TF1 0 500 1000 0 0.5 1 1.5 2 TF 4 0 500 1000 0 0.5 1 1.5 2 2.5 TF 3 0 500 1000 0 0.5 1 1.5 2 2.5 TF 2 0 500 1000 0 0.5 1 1.5 2 2.5 TF 1 0 500 1000 base-pairs around anchor TF1 TF2 TF3 TF4 0.0720.340.60.861.1 meta-region plots meta-region heatmaps heatmaps for genomic interval sets Piecharts for annotation Annota3on for genomic intervals in R Annota3on summaries for target genomic intervals can be obtained by annotateWithFeatures(). The resul3ng object can be visualized by plotTargetAnnotation() for piechart and heatTargetAnnotation() for a heatmap of annota3on overlapping percentages. Contributors: Altuna Akalin [aut, cre], Vedran Franke [aut, cre], Katarzyna Wreczycka [aut], Alexander Gosdschan [ctb], Liz Ing-Simmons [ctb] CitaCon: Akalin A, Franke V, Vlahovicek K, Mason CE, Schubeler D. (2015). Bioinforma3cs. Doi: 10.1093/bioinforma3cs/btu775 Genomic intervals with different kinds of informa3on Extract subset of genomic intervals for windows of interest Meta-region line plots Meta-region heatmap heatmaps for genomic intervals Annota3on pie charts Annota3on heatmaps Annota3on for genomic intervals Genomic intervals such as ChIP-seq peaks or differen3ally methylated regions needs to be annotated further with known genomic annota3ons such as promoter, intron, exon structures of the genes. This gives addi3onal informa3on for func3onal characteriza3on of the genomic intervals. This usually represented as percent of genomic features overlapping with the annota3on. urls: hap://bioinforma3cs.mdc-berlin.de/genoma3on hap://www.biconductor.org/release/genoma3on Install genomaCon: OR devtools::install_github(”BIMSBbioinfo/genomation”, build_vig=F) source("https://bioconductor.org/biocLite.R") ; biocLite(“genomation”) Visualize summaries and annota3on for genomic intervals Misc. func3ons getRandomEnrichment() calculates the significance of associa3on between two sets of genomics intervals. Valid for package version => 1.5.6