The document discusses using a grammar of graphics approach to create genomic visualizations. It introduces challenges in visualizing genomic data like its large size and wide scale. Existing tools are limited by only supporting one view type or not being integrated with analysis environments. The ggbio package extends the grammar of graphics to genomics by operating on standard genomic data structures and providing both high-level and basic plot types from an overview down to the sequence level.
Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. The volume and richness of data combined with its rate of change renders monitoring properties at scale by static recomputation infeasible. We approach these problems with massive, fine-grained parallelism across different shared memory architectures both to compute solutions and to explore the sensitivity of these solutions to natural bias and omissions within the data.
Learning, Training, Classification, Common Sense and Exascale ComputingJoel Saltz
In this talk, I will describe work my group has carried out in development of deep learning methods that target semantic segmentation and object identification tasks in terapixel Pathology datasets and for satellite data. I will describe what we have been able to achieve, how this work can generalize to additional types of problems and will outline how exascale computing could be used to transform and integrate our methods and pipelines. I will then go on to outline broad research program in exascale computing and deep learning that promises to identify common deep learning methods for previously disparate large and extreme scale data tasks.
2015 D-STOP Symposium session by UT Austin's Joydeep Ghosh. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=19m42s
Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
This is a copy of an invited talk I gave at the ACS meeting in Dallas in March 2014. The talk was about the impact of information technology on chemistry and related sciences. I interpreted 'information technology' broadly and divided the talk into three sections: Data, Simulation and Sociology.
'Data' talks about how chemical information has grown exponentially and how chemists are coming up with new techniques to store, organize and understand this information.
'Simulation' talks about how chemists are using the last two decades' spectacular progress in hardware and software to understand the behavior of molecules in a variety of applications ranging from drug design to new materials.
'Sociology' talks about the impact of blogs and social media on the practice of chemistry. More specifically I talk about how social media is serving as a 'second tier' of peer review and how this new medium is having an increasingly influential impact on many issues close to chemists' hearts including lab safety, 'chemophobia' and the public appreciation of chemistry.
Introduction to Systemics with focus on Systems BiologyMrinal Vashisth
The core content discusses the terminology used in Systems Sciences, the systems thinking/approach or Systemics. Focus is kept on Systems Biology for the most part of the presentations where it is compared with other disciplines and examples of Systems Biology approach and challenges of systems science are also discussed.
The sad thing about uploading this to Slide Share is that animations don't work.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. The volume and richness of data combined with its rate of change renders monitoring properties at scale by static recomputation infeasible. We approach these problems with massive, fine-grained parallelism across different shared memory architectures both to compute solutions and to explore the sensitivity of these solutions to natural bias and omissions within the data.
Learning, Training, Classification, Common Sense and Exascale ComputingJoel Saltz
In this talk, I will describe work my group has carried out in development of deep learning methods that target semantic segmentation and object identification tasks in terapixel Pathology datasets and for satellite data. I will describe what we have been able to achieve, how this work can generalize to additional types of problems and will outline how exascale computing could be used to transform and integrate our methods and pipelines. I will then go on to outline broad research program in exascale computing and deep learning that promises to identify common deep learning methods for previously disparate large and extreme scale data tasks.
2015 D-STOP Symposium session by UT Austin's Joydeep Ghosh. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=19m42s
Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
This is a copy of an invited talk I gave at the ACS meeting in Dallas in March 2014. The talk was about the impact of information technology on chemistry and related sciences. I interpreted 'information technology' broadly and divided the talk into three sections: Data, Simulation and Sociology.
'Data' talks about how chemical information has grown exponentially and how chemists are coming up with new techniques to store, organize and understand this information.
'Simulation' talks about how chemists are using the last two decades' spectacular progress in hardware and software to understand the behavior of molecules in a variety of applications ranging from drug design to new materials.
'Sociology' talks about the impact of blogs and social media on the practice of chemistry. More specifically I talk about how social media is serving as a 'second tier' of peer review and how this new medium is having an increasingly influential impact on many issues close to chemists' hearts including lab safety, 'chemophobia' and the public appreciation of chemistry.
Introduction to Systemics with focus on Systems BiologyMrinal Vashisth
The core content discusses the terminology used in Systems Sciences, the systems thinking/approach or Systemics. Focus is kept on Systems Biology for the most part of the presentations where it is compared with other disciplines and examples of Systems Biology approach and challenges of systems science are also discussed.
The sad thing about uploading this to Slide Share is that animations don't work.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
1. A Grammar of Graphics for Genomics
The ggbio Package
Michael Lawrence
Genentech
August 29, 2012
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18
2. Outline
1 Motivation
2 High-level Plots
3 Grammar Components
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 2 / 18
3. Outline
1 Motivation
2 High-level Plots
3 Grammar Components
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 3 / 18
4. Data on the Genome
• Comes in two flavors:
• Annotations (genes, TF binding sites, ...)
• Experimental measurements (sequence reads)
• Both types are tied to genomic coordinates, providing a common axis
that permits cross-dataset comparison and inference
• Typically stored as a table, with the range as a fundamental variable
type, plus metadata
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18
5. Data on the Genome
• Comes in two flavors:
• Annotations (genes, TF binding sites, ...)
• Experimental measurements (sequence reads)
• Both types are tied to genomic coordinates, providing a common axis
that permits cross-dataset comparison and inference
• Typically stored as a table, with the range as a fundamental variable
type, plus metadata
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18
6. Data on the Genome
• Comes in two flavors:
• Annotations (genes, TF binding sites, ...)
• Experimental measurements (sequence reads)
• Both types are tied to genomic coordinates, providing a common axis
that permits cross-dataset comparison and inference
• Typically stored as a table, with the range as a fundamental variable
type, plus metadata
60
50
40
30
20
10
0
120928000 120930000 120932000 120934000 120936000 120938000
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18
7. Data on the Genome
• Comes in two flavors:
• Annotations (genes, TF binding sites, ...)
• Experimental measurements (sequence reads)
• Both types are tied to genomic coordinates, providing a common axis
that permits cross-dataset comparison and inference
• Typically stored as a table, with the range as a fundamental variable
type, plus metadata
60
50
40
30
20
10
0
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18
8. Data on the Genome
• Comes in two flavors:
• Annotations (genes, TF binding sites, ...)
• Experimental measurements (sequence reads)
• Both types are tied to genomic coordinates, providing a common axis
that permits cross-dataset comparison and inference
• Typically stored as a table, with the range as a fundamental variable
type, plus metadata
seqnames start end strand exon id tx id
10 120927215 120928045 - 129230 14886,14887
10 120928689 120928854 - 129229 14886,14887
10 120931894 120931997 - 129228 14886,14887
10 120933249 120933384 - 129227 14886,14887
10 120933963 120934069 - 129226 14886
10 120933963 120934104 - 119757 14887
10 120936533 120936665 - 119756 14887
10 120936552 120936665 - 129225 14886
10 120938267 120938345 - 129224 14886,14887
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18
9. Challenges
Big data, wide spaces
• Need summaries that are efficiently computed, communicate more
with less and expose the most interesting aspects of the data
• Need different ways of viewing the data, depending on the density and
scale, from whole genome to single basepair
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18
10. Challenges
Big data, wide spaces
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
• Need summaries that are efficiently computed, communicate more
with less and expose the most interesting aspects of the data
• Need different ways of viewing the data, depending on the density and
scale, from whole genome to single basepair
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18
11. Challenges
Big data, wide spaces
60
50
40
30
20
10
0
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
• Need summaries that are efficiently computed, communicate more
with less and expose the most interesting aspects of the data
• Need different ways of viewing the data, depending on the density and
scale, from whole genome to single basepair
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18
12. Challenges
Big data, wide spaces
300000
250000
200000
150000
100000
50000
0
0 Mb 50 Mb 100 Mb
• Need summaries that are efficiently computed, communicate more
with less and expose the most interesting aspects of the data
• Need different ways of viewing the data, depending on the density and
scale, from whole genome to single basepair
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18
13. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
14. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
15. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
16. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
17. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
18. Existing Tools
UCSC IGB IGV Circos GViz
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
19. Existing Tools
UCSC IGB IGV Circos GViz
Limitations
• Limited to one type of view (linear or circular)
• Not tightly integrated with an analysis environment through
standard, abstract data structures (except GViz)
• No low-level toolkit for prototyping new types of graphics
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18
20. Grammars of Graphics
• A grammar of graphics is a
language for expressing plots
• Graphics are constructed
through the combination of
various types of primitives;
like legos for graphics
• The most prominent
grammar was introduced by
Wilkinson’s book The
Grammar of Graphics
• Wilkinson’s grammar was
extended by Wickham and
the ggplot2 package
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18
21. Grammars of Graphics
• A grammar of graphics is a
language for expressing plots
• Graphics are constructed
through the combination of
various types of primitives;
like legos for graphics
• The most prominent
grammar was introduced by
Wilkinson’s book The
Grammar of Graphics
• Wilkinson’s grammar was
extended by Wickham and
the ggplot2 package
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18
22. Grammars of Graphics
• A grammar of graphics is a
language for expressing plots
• Graphics are constructed
through the combination of
various types of primitives;
like legos for graphics
• The most prominent
grammar was introduced by
Wilkinson’s book The
Grammar of Graphics
• Wilkinson’s grammar was
extended by Wickham and
the ggplot2 package
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18
23. Grammars of Graphics
• A grammar of graphics is a
language for expressing plots
• Graphics are constructed
through the combination of
various types of primitives;
like legos for graphics
• The most prominent
grammar was introduced by
Wilkinson’s book The
Grammar of Graphics
• Wilkinson’s grammar was
extended by Wickham and
the ggplot2 package
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18
24. The ggbio Package
• An R/Bioconductor package that extends the Wilkinson/Wickham
grammar for applications in genomics
• Integrated with Bioconductor
• Operates on standard, abstract genomic data structures
• Leverages efficient range-based algorithms
• Programming interface has two levels of abstraction:
autoplot Maps Bioconductor data structures to plots
grammar Mix and match to create custom plots
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 8 / 18
25. Outline
1 Motivation
2 High-level Plots
3 Grammar Components
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 9 / 18
26. Basic Plots
Gene Structures Read Alignments Sequence Multiple
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
27. Basic Plots
Gene Structures Read Alignments Sequence Multiple
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
28. Basic Plots
Gene Structures Read Alignments Sequence Multiple
60
50
40
30
20
10
0
120928000 120930000 120932000 120934000 120936000 120938000
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
29. Basic Plots
Gene Structures Read Alignments Sequence Multiple
A
C
G
T
120928700 120928750 120928800 120928850
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
30. Basic Plots
Gene Structures Read Alignments Sequence Multiple
CGTAGGAGAATCCGGTGTCCAGTTCGCTGGGCAGACTTCTCCATGTGTTT
120928690 120928700 120928710 120928720 120928730 120928740
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
31. Basic Plots
Gene Structures Read Alignments Sequence Multiple
60
50
40
30
20
10
0
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18
32. Overview Plots
Grand Linear Karyogram Circular
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18
33. Overview Plots
Grand Linear Karyogram Circular
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18
34. Overview Plots
Grand Linear Karyogram Circular
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
seqReg
Exon
Intron
Other
5.0e+07 1.0e+08 1.5e+08 2.0e+08
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18
35. Overview Plots
Grand Linear Karyogram Circular
21 22
1
20
50M
50M
0M
0M
100M
0M
150M
50M
19
200M
0M
M
50
0M
0M
18 2
M
50
M
50
0M
10
0M
0M
15
q
17
0M
20
M
50
q q
0M
q
0M
q
16
M
50M 50
3
q
0M 100M
q
100M q
150M rearrangements
15
50M
q
q
interchromosomal
q q
q
q
0M
intrachromosomal
0M
50M
100M tumreads
4
q
4
14
q
50M 100M
0M 150M
q 6
q
q 8
q
100M
0M q 10
13
q
q 12
50M
50M
0M q
100M
5
q
q 15
0M
10 0M
50 q
12
M
0M
0M
50
M
q 10
0M
10
6
M 0
15
50
0M
M
11
0M
0M
50
10
M
100M
0M
50M
150M
7
0M
0M
10
50M
100M
100M
50M
0M
9 8
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18
36. Specialized Plots
Mismatch summary + VCF Edge-linked Intervals
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18
37. Specialized Plots
Mismatch summary + VCF Edge-linked Intervals
15
10
read
mismatch
A
Counts
C
G
T
5
0
snp
T
reference
A T T A A G A A A G T A C C G T G T G A C A T C A C A G G C T G G G A G C T T G A G A G
25235720 25235725 25235730 25235735 25235740 25235745 25235750 25235755
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18
38. Specialized Plots
Mismatch summary + VCF Edge-linked Intervals
1000
800
Expression
group
600
GM12878
400
K562
200
0
uc002raw.2
uc010yjh.1
uc002rav.2
uc010yjg.1
uc002rau.2
10930000 10940000 10950000 10960000 10970000 10980000
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18
39. Outline
1 Motivation
2 High-level Plots
3 Grammar Components
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 13 / 18
40. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
41. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
42. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
43. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
44. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
45. The Wilkinson/Wickham Grammar of Graphics
Geom The shape used for drawing the data
Stat Transforms the data before plotting
Scale Maps data to geom aesthetics, guides like legends and axes
Coord Maps from geom space to device space
Facet Small multiples of data subsets (trellis)
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18
46. A Grammar of Graphics for Genomics
Extensions are marked in red
geometric object:
Y scale: discrete chevron geometric object: rect
from stepping
color scale: discrete
2 from strand
strand
statistical transformation: +
stepping
−
geometric object:
1 alignment
48.245 Mb 48.260 Mb 48.265 Mb 48.270 Mb X scale: sequence
48.250 Mb 48.255 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 15 / 18
47. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
48. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
NM_014098(GeneID:10935)
NM_006793(GeneID:10935)
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
49. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
NM_014098(GeneID:10935)
NM_006793(GeneID:10935)
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
50. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
NM_014098(GeneID:10935)
NM_006793(GeneID:10935)
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
51. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
NM_014098(GeneID:10935)
NM_006793(GeneID:10935)
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
52. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
NM_014098(GeneID:10935)
NM_006793(GeneID:10935)
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
53. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
original
reduced
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
54. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
stepping
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
55. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
60
50
40
Coverage
30
20
10
0
120928000 120930000 120932000 120934000 120936000 120938000
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
56. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
60
50
40
A
Counts
30 C
G
T
20
10
0
120928000 120930000 120932000 120934000 120936000 120938000
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
57. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
60
50
40
A
Counts
C
30
G
20 T
10
0
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
58. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
original
truncated
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
59. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
2000
1500
normal
1000
500
0 score
2000
500
1000
1500 1500
novel
tumor
1000 FALSE
TRUE
500
0
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
60. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
10 11
6e+05
Coverage
4e+05
2e+05
0e+00
0e+00 5e+07 1e+08 0e+00 5e+07 1e+08
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
61. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
6e+05
Coverage
4e+05
2e+05
0e+00
10 11
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
62. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
200
150
value
Features
5000
100
0
−5000
50
1 2 3 4 5 6
Samples
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
63. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
chr10
chr10 chr10
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
64. Components of the Genomic Grammar
Geom: alignment chevron arch arrow arrowrect
Stat: gene reduce stepping coverage mismatch table
Scale: sequence genome fold-change giemsa
Coord: truncate-gaps
Layout: tracks range-facet
chr10
chr10 chr10
60
50
40 A
Counts
C
30
G
20 T
10
0
120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18
65. Summary
• The ggbio package is a toolkit for plotting genomic data and
annotations
• Available as part of the Bioconductor project
• Easy to use and flexible enough to handle the diverse use cases
encountered in genomics
• Useful plots are automatically generated from Bioconductor data
structures using reasonable defaults
• New types of plots can be constructed from grammar primitives
specially designed for genomics
Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 17 / 18