SlideShare a Scribd company logo
Michael M. Hoffman
Princess Margaret Cancer Centre
Vector Institute
Department of Medical Biophysics
Department of Computer Science
University of Toronto
https://hoffmanlab.org/
Segway and the Graphical Models Toolkit:
A framework for probabilistic genomic inference
@michaelhoffman
#probgen18
Geographical maps have…
Figures from 1) National Geographic Society (2011) GIS.
street data
+ buildings data
+ vegetation data
= integrated map
Rachel Chan
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Semi-automated genome annotation
genomic signal
pattern
discovery
visualization
annotation
interpretation
Segway
A way to segment the genome
https://segway.hoffmanlab.org/
Hoffman MM et al. 2012. Nat Methods 9:473.
Genomic segmentation
Nonoverlapping segments
0 1 0 21 1
Finite number of labels
0 1 0 1 1
Maximize similarity in labels
2
0 1 0 1 1
Maximize similarity in labels
2
0 1 0 1 1
Maximize similarity in labels
2
TSS transcription start site
GS gene start
GM gene middle
GE gene end
E enhancer
I distal CTCF
R repression
D dead
Transcription start site (TSS)
Hoffman MM et al. 2013. Nucleic Acids Res 41:827.
Graphical Models Toolkit (GMTK)
• General purpose toolkit for probabilistic
inference in temporal signals (speech,
language, activity recognition, genomics)
• Written in highly optimized scalable C++
• Supports many machine learning algorithms,
inference procedures, and probability models.
• Can express arbitrary structured dynamic graphical models
• http://melodi.ee.washington.edu/gmtk
• conda install -c bioconda gmtk
Jeff Bilmes
Q
hidden random variable
discrete
Graphical
Q
hidden random variable
discrete P(q|θ)
= P(Q = q) Equation
Graphical
realization parameters
Q
hidden random variable
discrete P(q|θ)
= P(Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
Equation
Graphical
GMTKL
Q
hidden random variable
discrete P(q|θ)
= P(Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
DENSE_CPT_IN_FILE inline % conditional probability tables
1 % total number of CPTs = 1
0 start_state % CPT #0, "start_state"
0 % 0 parents
2 % output cardinality 2
0.25 % P(Q = 0) = 0.25
0.75 % P(Q = 1) = 0.75
Equation
Graphical
GMTKL
Parameters
Q
X
hidden random variable
conditional relationship
observed random variable
discrete continuous P(q,x|θ)
= P(Q = q)P(X = x|Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous P(q0,x0|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
X0
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous
Q1
P(q0:1,x0|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
}
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous
Q1
X1
P(q0:1,x0:1|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)P(X₁ = x₁|Q₁ = q₁)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
Xt+1
hidden random variable
conditional relationship
observed random variable
discrete continuous
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
chunk 1:1
Qt
XtXt
Qt+1
Xt+1
P(q0:T,x0:T|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt)
T
hidden random variable
conditional relationship
observed random variable
discrete continuous
Qt
Xt Xt+1
Qt+1
Xt+2
Qt+2
Xt+3
Qt+3
Xt+4
Qt+4
Xt+5
Qt+5
Dynamic Bayesian network
(hidden Markov model)
P(q0:T,x0:T|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt)
T
Dynamic Bayesian Network for segmentation
segment
label
CTCF
H3K36me3
DNaseI
hidden random variable
conditional relationship
observed random variable
discrete continuous
Segway work flow
.fasta.gz
genomedata-load
segway train
.genomedata
.bed.gz
.bedGraph.gz
.wig.gz
segway annotate
GMTKL structure
starting params
discovered params
segway.bed.gz
segtools
.png, .pdf, .tab
What does Segway do, really?
1. Creates GMTKL structure and parameter files for semi-
automated genome annotation
2. Converts genomic data to a GMTK binary observation
format
3. Runs GMTK to perform EM training, Viterbi decoding,
posterior inference
4. Manages job execution via cluster (SGE, LSF, PBS,
Slurm), or local multiprocessing
5. Converts GMTK output to genomics formats (BED,
wiggle)
Segway modular API (pull request #91)
segway train
segway train-init
segway train-run
segway train-run-round
segway train-finish
segway annotate
segway annotate-init
segway annotate-run
segway annotate-finish
segway posterior
segway posterior-init
segway posterior-run
segway posterior-finish
Handling missing data
hidden random variable
conditional
observed random variable
segment
DNaseI
discrete continuous switching
1 0
segment
label
CTCF
H3K36me3
DNaseI
hidden random variable
conditional
observed random variable
discrete continuous switching
present(CTCF)
present(H3K36me3)
present(DNaseI)
Handling missing data
segment
label
CTCF
H3K36me3
DNaseI
present(CTCF)
present(H3K36me3)
present(DNaseI)
Length
distribution
segment
label
CTCF
H3K36me3
DNaseI
present(CTCF)
present(H3K36me3)
present(DNaseI)
segment
countdown
segment
transition
rulerframe indexLength
distribution
• Minimum segment length
• Maximum segment length
• Trained geometric length distribution
• Dirichlet prior on segment length
• Weight of prior versus observed data
Hierarchical models
• Two-level model
topology
• Mixture of large-
scale and
punctate labels
“closed”
“open”
“intermediate”
supersegment
label
CTCF
present(CTCF)
segment
countdown
segment
transition
rulerframe index
segment
label
segment
label
long RNA-seq
cytosol
short RNA-seq
whole cell
CAGE
nucleus
+ strand
hidden random variable
conditional relationship
observed random variable
discrete continuous
segment
label
long RNA-seq
cytosol
short RNA-seq
whole cell
CAGE
nucleus
+ strand – strand
hidden random variable
conditional relationship
observed random variable
discrete continuous
Adding genome sequence
segment
H3ac
RNA
nucleotide
H3K27me3
dinucleotide
Adding data relationships
segment
histone
HeLa
nucleotide
GM12878
dinucleotide
Adding evolution
segment
Ptro nucleotide
Ptro dinucleotide
Hsap nucleotide
H3K27me3
Hsap dinucleotide
Adding variation
segment
YRI nucleotide
YRI dinucleotide
ancestral nucleotide
YRI DAF
ancestral dinucleotide
https://segway.hoffmanlab.org/
Acknowledgments
The Hoffman Lab
Jeff Bilmes
William Noble
Max Libbrecht
Funding:
Princess Margaret Cancer Foundation
Canadian Institutes of Health Research
Canadian Cancer Society
Natural Sciences and Engineering Research
Council
Ontario Institute for Cancer Research
Ontario Ministry of Research, Innovation and
Science
Medicine by Design
McLaughlin Centre
Samantha Wilson
Coby Viner
Mickaël Mendez
Danielle Denisko
Chang Cao
Lee Zamparo
Eric Roberts
Mehran Karimzadeh
Francis Nguyen
Rachel Chan
Matthew McNeil
Natalia Mukhina
Postdoctoral, MSc, PhD positions
available in my research lab at the
Princess Margaret Cancer Centre
Dept of Medical Biophysics
Dept of Computer Science
University of Toronto
Please approach me for details.
Michael Hoffman
https://hoffmanlab.org/
michael.hoffman@utoronto.ca
@michaelhoffman

More Related Content

Similar to Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference

Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
Preferred Networks
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
JuanPabloCarbajal3
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
岳華 杜
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
RTEFGDFGJU
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
岳華 杜
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
thanhdowork
 
Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2
M Beneragama
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
Victor Asanza
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
Ryo Iwaki
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
VLSICS Design
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
VLSICS Design
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Amro Elfeki
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
Iosif Itkin
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosies
FoCAS Initiative
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptx
QucngV
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
sipij
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...
Andrea Castelletti
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Work-Bench
 

Similar to Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference (20)

Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosies
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptx
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 

Recently uploaded

BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 

Recently uploaded (20)

BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 

Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference

  • 1. Michael M. Hoffman Princess Margaret Cancer Centre Vector Institute Department of Medical Biophysics Department of Computer Science University of Toronto https://hoffmanlab.org/ Segway and the Graphical Models Toolkit: A framework for probabilistic genomic inference @michaelhoffman #probgen18
  • 2. Geographical maps have… Figures from 1) National Geographic Society (2011) GIS. street data + buildings data + vegetation data = integrated map Rachel Chan
  • 3. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 4. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 5. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 6. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 7. Semi-automated genome annotation genomic signal pattern discovery visualization annotation interpretation
  • 8. Segway A way to segment the genome https://segway.hoffmanlab.org/ Hoffman MM et al. 2012. Nat Methods 9:473.
  • 11. 0 1 0 21 1 Finite number of labels
  • 12. 0 1 0 1 1 Maximize similarity in labels 2
  • 13. 0 1 0 1 1 Maximize similarity in labels 2
  • 14. 0 1 0 1 1 Maximize similarity in labels 2
  • 15. TSS transcription start site GS gene start GM gene middle GE gene end E enhancer I distal CTCF R repression D dead
  • 16. Transcription start site (TSS) Hoffman MM et al. 2013. Nucleic Acids Res 41:827.
  • 17. Graphical Models Toolkit (GMTK) • General purpose toolkit for probabilistic inference in temporal signals (speech, language, activity recognition, genomics) • Written in highly optimized scalable C++ • Supports many machine learning algorithms, inference procedures, and probability models. • Can express arbitrary structured dynamic graphical models • http://melodi.ee.washington.edu/gmtk • conda install -c bioconda gmtk Jeff Bilmes
  • 19. Q hidden random variable discrete P(q|θ) = P(Q = q) Equation Graphical realization parameters
  • 20. Q hidden random variable discrete P(q|θ) = P(Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } Equation Graphical GMTKL
  • 21. Q hidden random variable discrete P(q|θ) = P(Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } DENSE_CPT_IN_FILE inline % conditional probability tables 1 % total number of CPTs = 1 0 start_state % CPT #0, "start_state" 0 % 0 parents 2 % output cardinality 2 0.25 % P(Q = 0) = 0.25 0.75 % P(Q = 1) = 0.75 Equation Graphical GMTKL Parameters
  • 22. Q X hidden random variable conditional relationship observed random variable discrete continuous P(q,x|θ) = P(Q = q)P(X = x|Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); }
  • 23. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous P(q0,x0|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } X0
  • 24. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous Q1 P(q0:1,x0|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } }
  • 25. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous Q1 X1 P(q0:1,x0:1|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)P(X₁ = x₁|Q₁ = q₁) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } }
  • 26. Xt+1 hidden random variable conditional relationship observed random variable discrete continuous frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } chunk 1:1 Qt XtXt Qt+1 Xt+1 P(q0:T,x0:T|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt) T
  • 27. hidden random variable conditional relationship observed random variable discrete continuous Qt Xt Xt+1 Qt+1 Xt+2 Qt+2 Xt+3 Qt+3 Xt+4 Qt+4 Xt+5 Qt+5 Dynamic Bayesian network (hidden Markov model) P(q0:T,x0:T|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt) T
  • 28. Dynamic Bayesian Network for segmentation segment label CTCF H3K36me3 DNaseI hidden random variable conditional relationship observed random variable discrete continuous
  • 29. Segway work flow .fasta.gz genomedata-load segway train .genomedata .bed.gz .bedGraph.gz .wig.gz segway annotate GMTKL structure starting params discovered params segway.bed.gz segtools .png, .pdf, .tab
  • 30. What does Segway do, really? 1. Creates GMTKL structure and parameter files for semi- automated genome annotation 2. Converts genomic data to a GMTK binary observation format 3. Runs GMTK to perform EM training, Viterbi decoding, posterior inference 4. Manages job execution via cluster (SGE, LSF, PBS, Slurm), or local multiprocessing 5. Converts GMTK output to genomics formats (BED, wiggle)
  • 31. Segway modular API (pull request #91) segway train segway train-init segway train-run segway train-run-round segway train-finish segway annotate segway annotate-init segway annotate-run segway annotate-finish segway posterior segway posterior-init segway posterior-run segway posterior-finish
  • 32. Handling missing data hidden random variable conditional observed random variable segment DNaseI discrete continuous switching 1 0
  • 33. segment label CTCF H3K36me3 DNaseI hidden random variable conditional observed random variable discrete continuous switching present(CTCF) present(H3K36me3) present(DNaseI) Handling missing data
  • 35. segment label CTCF H3K36me3 DNaseI present(CTCF) present(H3K36me3) present(DNaseI) segment countdown segment transition rulerframe indexLength distribution • Minimum segment length • Maximum segment length • Trained geometric length distribution • Dirichlet prior on segment length • Weight of prior versus observed data
  • 36. Hierarchical models • Two-level model topology • Mixture of large- scale and punctate labels “closed” “open” “intermediate”
  • 38. segment label long RNA-seq cytosol short RNA-seq whole cell CAGE nucleus + strand hidden random variable conditional relationship observed random variable discrete continuous
  • 39. segment label long RNA-seq cytosol short RNA-seq whole cell CAGE nucleus + strand – strand hidden random variable conditional relationship observed random variable discrete continuous
  • 42. Adding evolution segment Ptro nucleotide Ptro dinucleotide Hsap nucleotide H3K27me3 Hsap dinucleotide
  • 43. Adding variation segment YRI nucleotide YRI dinucleotide ancestral nucleotide YRI DAF ancestral dinucleotide
  • 45.
  • 46.
  • 47. Acknowledgments The Hoffman Lab Jeff Bilmes William Noble Max Libbrecht Funding: Princess Margaret Cancer Foundation Canadian Institutes of Health Research Canadian Cancer Society Natural Sciences and Engineering Research Council Ontario Institute for Cancer Research Ontario Ministry of Research, Innovation and Science Medicine by Design McLaughlin Centre Samantha Wilson Coby Viner Mickaël Mendez Danielle Denisko Chang Cao Lee Zamparo Eric Roberts Mehran Karimzadeh Francis Nguyen Rachel Chan Matthew McNeil Natalia Mukhina
  • 48. Postdoctoral, MSc, PhD positions available in my research lab at the Princess Margaret Cancer Centre Dept of Medical Biophysics Dept of Computer Science University of Toronto Please approach me for details. Michael Hoffman https://hoffmanlab.org/ michael.hoffman@utoronto.ca @michaelhoffman

Editor's Notes

  1. - need laser pointer - turn Workrave off turn phone off turn iPad off Happy to answer questions in middle of talk except…
  2. For example, geographical maps have street data…buildings data…vegetation data…all resulting in an integrated map
  3. Semi-automated genomic annotation begins with pattern discovery from multiple genomic data sets and results in: A simple annotation with a single label for each part of the genome using these patterns We can use this annotation to visualise a huge number of datatracks, eg viewing the resulting patterns found and the annotation instead of looking at each datatrack individually We can interpret the context and potential impact of the results, for example the meaning of the patterns and annotation we found. Some questions that semi-automated annotation can answer include: Pattern discovery: What signals from multiple experiments do we see over and over again? Annotation: What does a particular piece of the genome do, in a nutshell? Visualization: How can we make complex data comprehensible visually? Interpretation: What is the context and potential impact of the results we are finding?
  4. To perform genomic segmentation, Segway first ‘observes’ multiple datasets of genomic data.
  5. Then, it splits the datasets up into non-overlapping segments.
  6. To each segment, Segway assigns a label from a finite set
  7. Segway then maximizes the similarity between segments of the same label by pushing around the boundaries of the segments. These are our ‘learned patterns’
  8. For example, here we have a 0-label with a low-high-low pattern
  9. 1-label with a high-low-high pattern
  10. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  11. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  12. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  13. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  14. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  15. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  16. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  17. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  18. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  19. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  20. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  21. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  22. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  23. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  24. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  25. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  26. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  27. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  28. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  29. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data) Discuss mixture model
  30. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  31. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  32. And finally, I'd like to thank you for your very kind attention.