Segway is a widely-used method for performing automated genome annotation. Researchers usually use Segway to discover recurring patterns across multiple epigenomic datasets. Then they use Segway to annotate the genome with labels for these patterns, often called chromatin states. For example, one might learn labels for promoters, enhancers, or quiescent genomic regions from histone modification and open chromatin data.
Segway is implemented using the Graphical Models Toolkit, a flexible system for hidden Markov model and dynamic Bayesian network inference. While the prevailing use of Segway remains unsupervised annotation from epigenome data, it can also perform training, posterior probability estimation, and Viterbi decoding for a wide range of probabilistic models with a recurrent structure on a genomic axis. It can accept a wide variety of models that relate hidden and observed random variables at each genomic position with each other and neighboring positions and perform necessary inference without much additional programming.
We have developed a more configurable interface to Segway to allow its use for more diverse classes of problems, models, and algorithms. We will describe several of the extensions over a simple hidden Markov model, such as semi-Markov state durations, a transcriptome model with a reversed copy of the model for stranded data, a graph-based regularization method for incorporating long-range chromatin interaction data, a semi-supervised training approach, locus-specific prior knowledge, and modeling observations with arbitrary mixtures of Gaussians. We will use some of these examples to describe how you might develop your own models.
Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference
1. Michael M. Hoffman
Princess Margaret Cancer Centre
Vector Institute
Department of Medical Biophysics
Department of Computer Science
University of Toronto
https://hoffmanlab.org/
Segway and the Graphical Models Toolkit:
A framework for probabilistic genomic inference
@michaelhoffman
#probgen18
2. Geographical maps have…
Figures from 1) National Geographic Society (2011) GIS.
street data
+ buildings data
+ vegetation data
= integrated map
Rachel Chan
30. What does Segway do, really?
1. Creates GMTKL structure and parameter files for semi-
automated genome annotation
2. Converts genomic data to a GMTK binary observation
format
3. Runs GMTK to perform EM training, Viterbi decoding,
posterior inference
4. Manages job execution via cluster (SGE, LSF, PBS,
Slurm), or local multiprocessing
5. Converts GMTK output to genomics formats (BED,
wiggle)
47. Acknowledgments
The Hoffman Lab
Jeff Bilmes
William Noble
Max Libbrecht
Funding:
Princess Margaret Cancer Foundation
Canadian Institutes of Health Research
Canadian Cancer Society
Natural Sciences and Engineering Research
Council
Ontario Institute for Cancer Research
Ontario Ministry of Research, Innovation and
Science
Medicine by Design
McLaughlin Centre
Samantha Wilson
Coby Viner
Mickaël Mendez
Danielle Denisko
Chang Cao
Lee Zamparo
Eric Roberts
Mehran Karimzadeh
Francis Nguyen
Rachel Chan
Matthew McNeil
Natalia Mukhina
48. Postdoctoral, MSc, PhD positions
available in my research lab at the
Princess Margaret Cancer Centre
Dept of Medical Biophysics
Dept of Computer Science
University of Toronto
Please approach me for details.
Michael Hoffman
https://hoffmanlab.org/
michael.hoffman@utoronto.ca
@michaelhoffman
Editor's Notes
- need laser pointer
- turn Workrave off
turn phone off
turn iPad off
Happy to answer questions in middle of talk except…
For example, geographical maps have street data…buildings data…vegetation data…all resulting in an integrated map
Semi-automated genomic annotation begins with pattern discovery from multiple genomic data sets and results in:
A simple annotation with a single label for each part of the genome using these patterns
We can use this annotation to visualise a huge number of datatracks, eg viewing the resulting patterns found and the annotation instead of looking at each datatrack individually
We can interpret the context and potential impact of the results, for example the meaning of the patterns and annotation we found.
Some questions that semi-automated annotation can answer include:
Pattern discovery: What signals from multiple experiments do we see over and over again?
Annotation: What does a particular piece of the genome do, in a nutshell?
Visualization: How can we make complex data comprehensible visually?
Interpretation: What is the context and potential impact of the results we are finding?
To perform genomic segmentation, Segway first ‘observes’ multiple datasets of genomic data.
Then, it splits the datasets up into non-overlapping segments.
To each segment, Segway assigns a label from a finite set
Segway then maximizes the similarity between segments of the same label by pushing around the boundaries of the segments. These are our ‘learned patterns’
For example, here we have a 0-label with a low-high-low pattern
1-label with a high-low-high pattern
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
A hidden Markov multimodel: “no longer a HMM, not yet a DBN”
dashed line = switching relationship
The default Segway model for three observations.
light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
Discuss mixture model
light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
And finally, I'd like to thank you for your very kind attention.