SlideShare a Scribd company logo
1 of 48
Michael M. Hoffman
Princess Margaret Cancer Centre
Vector Institute
Department of Medical Biophysics
Department of Computer Science
University of Toronto
https://hoffmanlab.org/
Segway and the Graphical Models Toolkit:
A framework for probabilistic genomic inference
@michaelhoffman
#probgen18
Geographical maps have…
Figures from 1) National Geographic Society (2011) GIS.
street data
+ buildings data
+ vegetation data
= integrated map
Rachel Chan
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Functional genomics
ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
Semi-automated genome annotation
genomic signal
pattern
discovery
visualization
annotation
interpretation
Segway
A way to segment the genome
https://segway.hoffmanlab.org/
Hoffman MM et al. 2012. Nat Methods 9:473.
Genomic segmentation
Nonoverlapping segments
0 1 0 21 1
Finite number of labels
0 1 0 1 1
Maximize similarity in labels
2
0 1 0 1 1
Maximize similarity in labels
2
0 1 0 1 1
Maximize similarity in labels
2
TSS transcription start site
GS gene start
GM gene middle
GE gene end
E enhancer
I distal CTCF
R repression
D dead
Transcription start site (TSS)
Hoffman MM et al. 2013. Nucleic Acids Res 41:827.
Graphical Models Toolkit (GMTK)
• General purpose toolkit for probabilistic
inference in temporal signals (speech,
language, activity recognition, genomics)
• Written in highly optimized scalable C++
• Supports many machine learning algorithms,
inference procedures, and probability models.
• Can express arbitrary structured dynamic graphical models
• http://melodi.ee.washington.edu/gmtk
• conda install -c bioconda gmtk
Jeff Bilmes
Q
hidden random variable
discrete
Graphical
Q
hidden random variable
discrete P(q|θ)
= P(Q = q) Equation
Graphical
realization parameters
Q
hidden random variable
discrete P(q|θ)
= P(Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
Equation
Graphical
GMTKL
Q
hidden random variable
discrete P(q|θ)
= P(Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
DENSE_CPT_IN_FILE inline % conditional probability tables
1 % total number of CPTs = 1
0 start_state % CPT #0, "start_state"
0 % 0 parents
2 % output cardinality 2
0.25 % P(Q = 0) = 0.25
0.75 % P(Q = 1) = 0.75
Equation
Graphical
GMTKL
Parameters
Q
X
hidden random variable
conditional relationship
observed random variable
discrete continuous P(q,x|θ)
= P(Q = q)P(X = x|Q = q)
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous P(q0,x0|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
X0
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous
Q1
P(q0:1,x0|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
}
Q0
X0
hidden random variable
conditional relationship
observed random variable
discrete continuous
Q1
X1
P(q0:1,x0:1|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)P(X₁ = x₁|Q₁ = q₁)
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
Xt+1
hidden random variable
conditional relationship
observed random variable
discrete continuous
frame: 0 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: nil using DenseCPT("start_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
frame: 1 {
variable: state {
type: discrete hidden cardinality 2;
conditionalparents: state(-1) using DenseCPT("state_state");
}
variable: obs {
type: continuous observed 0:0;
conditionalparents: state(0) using mapping("state_obs");
}
}
chunk 1:1
Qt
XtXt
Qt+1
Xt+1
P(q0:T,x0:T|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt)
T
hidden random variable
conditional relationship
observed random variable
discrete continuous
Qt
Xt Xt+1
Qt+1
Xt+2
Qt+2
Xt+3
Qt+3
Xt+4
Qt+4
Xt+5
Qt+5
Dynamic Bayesian network
(hidden Markov model)
P(q0:T,x0:T|θ)
= P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt)
T
Dynamic Bayesian Network for segmentation
segment
label
CTCF
H3K36me3
DNaseI
hidden random variable
conditional relationship
observed random variable
discrete continuous
Segway work flow
.fasta.gz
genomedata-load
segway train
.genomedata
.bed.gz
.bedGraph.gz
.wig.gz
segway annotate
GMTKL structure
starting params
discovered params
segway.bed.gz
segtools
.png, .pdf, .tab
What does Segway do, really?
1. Creates GMTKL structure and parameter files for semi-
automated genome annotation
2. Converts genomic data to a GMTK binary observation
format
3. Runs GMTK to perform EM training, Viterbi decoding,
posterior inference
4. Manages job execution via cluster (SGE, LSF, PBS,
Slurm), or local multiprocessing
5. Converts GMTK output to genomics formats (BED,
wiggle)
Segway modular API (pull request #91)
segway train
segway train-init
segway train-run
segway train-run-round
segway train-finish
segway annotate
segway annotate-init
segway annotate-run
segway annotate-finish
segway posterior
segway posterior-init
segway posterior-run
segway posterior-finish
Handling missing data
hidden random variable
conditional
observed random variable
segment
DNaseI
discrete continuous switching
1 0
segment
label
CTCF
H3K36me3
DNaseI
hidden random variable
conditional
observed random variable
discrete continuous switching
present(CTCF)
present(H3K36me3)
present(DNaseI)
Handling missing data
segment
label
CTCF
H3K36me3
DNaseI
present(CTCF)
present(H3K36me3)
present(DNaseI)
Length
distribution
segment
label
CTCF
H3K36me3
DNaseI
present(CTCF)
present(H3K36me3)
present(DNaseI)
segment
countdown
segment
transition
rulerframe indexLength
distribution
• Minimum segment length
• Maximum segment length
• Trained geometric length distribution
• Dirichlet prior on segment length
• Weight of prior versus observed data
Hierarchical models
• Two-level model
topology
• Mixture of large-
scale and
punctate labels
“closed”
“open”
“intermediate”
supersegment
label
CTCF
present(CTCF)
segment
countdown
segment
transition
rulerframe index
segment
label
segment
label
long RNA-seq
cytosol
short RNA-seq
whole cell
CAGE
nucleus
+ strand
hidden random variable
conditional relationship
observed random variable
discrete continuous
segment
label
long RNA-seq
cytosol
short RNA-seq
whole cell
CAGE
nucleus
+ strand – strand
hidden random variable
conditional relationship
observed random variable
discrete continuous
Adding genome sequence
segment
H3ac
RNA
nucleotide
H3K27me3
dinucleotide
Adding data relationships
segment
histone
HeLa
nucleotide
GM12878
dinucleotide
Adding evolution
segment
Ptro nucleotide
Ptro dinucleotide
Hsap nucleotide
H3K27me3
Hsap dinucleotide
Adding variation
segment
YRI nucleotide
YRI dinucleotide
ancestral nucleotide
YRI DAF
ancestral dinucleotide
https://segway.hoffmanlab.org/
Acknowledgments
The Hoffman Lab
Jeff Bilmes
William Noble
Max Libbrecht
Funding:
Princess Margaret Cancer Foundation
Canadian Institutes of Health Research
Canadian Cancer Society
Natural Sciences and Engineering Research
Council
Ontario Institute for Cancer Research
Ontario Ministry of Research, Innovation and
Science
Medicine by Design
McLaughlin Centre
Samantha Wilson
Coby Viner
Mickaël Mendez
Danielle Denisko
Chang Cao
Lee Zamparo
Eric Roberts
Mehran Karimzadeh
Francis Nguyen
Rachel Chan
Matthew McNeil
Natalia Mukhina
Postdoctoral, MSc, PhD positions
available in my research lab at the
Princess Margaret Cancer Centre
Dept of Medical Biophysics
Dept of Computer Science
University of Toronto
Please approach me for details.
Michael Hoffman
https://hoffmanlab.org/
michael.hoffman@utoronto.ca
@michaelhoffman

More Related Content

Similar to Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference

Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future岳華 杜
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfRTEFGDFGJU
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia岳華 杜
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2M Beneragama
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...Victor Asanza
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...VLSICS Design
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSVLSICS Design
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Amro Elfeki
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesFoCAS Initiative
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxQucngV
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONsipij
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Andrea Castelletti
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 

Similar to Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference (20)

Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2Protein functional site prediction using the shotest path graphnew1 2
Protein functional site prediction using the shotest path graphnew1 2
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
 
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosies
 
Vu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptxVu_HPSC2012_02.pptx
Vu_HPSC2012_02.pptx
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 

Segway and the Graphical Models Toolkit: a framework for probabilistic genomic inference

  • 1. Michael M. Hoffman Princess Margaret Cancer Centre Vector Institute Department of Medical Biophysics Department of Computer Science University of Toronto https://hoffmanlab.org/ Segway and the Graphical Models Toolkit: A framework for probabilistic genomic inference @michaelhoffman #probgen18
  • 2. Geographical maps have… Figures from 1) National Geographic Society (2011) GIS. street data + buildings data + vegetation data = integrated map Rachel Chan
  • 3. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 4. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 5. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 6. Functional genomics ENCODE Project Consortium 2011. PLoS Biol 9:e1001046.
  • 7. Semi-automated genome annotation genomic signal pattern discovery visualization annotation interpretation
  • 8. Segway A way to segment the genome https://segway.hoffmanlab.org/ Hoffman MM et al. 2012. Nat Methods 9:473.
  • 11. 0 1 0 21 1 Finite number of labels
  • 12. 0 1 0 1 1 Maximize similarity in labels 2
  • 13. 0 1 0 1 1 Maximize similarity in labels 2
  • 14. 0 1 0 1 1 Maximize similarity in labels 2
  • 15. TSS transcription start site GS gene start GM gene middle GE gene end E enhancer I distal CTCF R repression D dead
  • 16. Transcription start site (TSS) Hoffman MM et al. 2013. Nucleic Acids Res 41:827.
  • 17. Graphical Models Toolkit (GMTK) • General purpose toolkit for probabilistic inference in temporal signals (speech, language, activity recognition, genomics) • Written in highly optimized scalable C++ • Supports many machine learning algorithms, inference procedures, and probability models. • Can express arbitrary structured dynamic graphical models • http://melodi.ee.washington.edu/gmtk • conda install -c bioconda gmtk Jeff Bilmes
  • 19. Q hidden random variable discrete P(q|θ) = P(Q = q) Equation Graphical realization parameters
  • 20. Q hidden random variable discrete P(q|θ) = P(Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } Equation Graphical GMTKL
  • 21. Q hidden random variable discrete P(q|θ) = P(Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } DENSE_CPT_IN_FILE inline % conditional probability tables 1 % total number of CPTs = 1 0 start_state % CPT #0, "start_state" 0 % 0 parents 2 % output cardinality 2 0.25 % P(Q = 0) = 0.25 0.75 % P(Q = 1) = 0.75 Equation Graphical GMTKL Parameters
  • 22. Q X hidden random variable conditional relationship observed random variable discrete continuous P(q,x|θ) = P(Q = q)P(X = x|Q = q) variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); }
  • 23. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous P(q0,x0|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } X0
  • 24. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous Q1 P(q0:1,x0|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } }
  • 25. Q0 X0 hidden random variable conditional relationship observed random variable discrete continuous Q1 X1 P(q0:1,x0:1|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀)P(Q₁ = q₁|Q₀ = q₀)P(X₁ = x₁|Q₁ = q₁) frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } }
  • 26. Xt+1 hidden random variable conditional relationship observed random variable discrete continuous frame: 0 { variable: state { type: discrete hidden cardinality 2; conditionalparents: nil using DenseCPT("start_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } frame: 1 { variable: state { type: discrete hidden cardinality 2; conditionalparents: state(-1) using DenseCPT("state_state"); } variable: obs { type: continuous observed 0:0; conditionalparents: state(0) using mapping("state_obs"); } } chunk 1:1 Qt XtXt Qt+1 Xt+1 P(q0:T,x0:T|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt) T
  • 27. hidden random variable conditional relationship observed random variable discrete continuous Qt Xt Xt+1 Qt+1 Xt+2 Qt+2 Xt+3 Qt+3 Xt+4 Qt+4 Xt+5 Qt+5 Dynamic Bayesian network (hidden Markov model) P(q0:T,x0:T|θ) = P(Q₀ = q₀)P(X₀ = x₀|Q₀ = q₀) Πt=1 P(Qt = qt|Qt-1 = qt-1)P(Xt = xt|Qt = qt) T
  • 28. Dynamic Bayesian Network for segmentation segment label CTCF H3K36me3 DNaseI hidden random variable conditional relationship observed random variable discrete continuous
  • 29. Segway work flow .fasta.gz genomedata-load segway train .genomedata .bed.gz .bedGraph.gz .wig.gz segway annotate GMTKL structure starting params discovered params segway.bed.gz segtools .png, .pdf, .tab
  • 30. What does Segway do, really? 1. Creates GMTKL structure and parameter files for semi- automated genome annotation 2. Converts genomic data to a GMTK binary observation format 3. Runs GMTK to perform EM training, Viterbi decoding, posterior inference 4. Manages job execution via cluster (SGE, LSF, PBS, Slurm), or local multiprocessing 5. Converts GMTK output to genomics formats (BED, wiggle)
  • 31. Segway modular API (pull request #91) segway train segway train-init segway train-run segway train-run-round segway train-finish segway annotate segway annotate-init segway annotate-run segway annotate-finish segway posterior segway posterior-init segway posterior-run segway posterior-finish
  • 32. Handling missing data hidden random variable conditional observed random variable segment DNaseI discrete continuous switching 1 0
  • 33. segment label CTCF H3K36me3 DNaseI hidden random variable conditional observed random variable discrete continuous switching present(CTCF) present(H3K36me3) present(DNaseI) Handling missing data
  • 35. segment label CTCF H3K36me3 DNaseI present(CTCF) present(H3K36me3) present(DNaseI) segment countdown segment transition rulerframe indexLength distribution • Minimum segment length • Maximum segment length • Trained geometric length distribution • Dirichlet prior on segment length • Weight of prior versus observed data
  • 36. Hierarchical models • Two-level model topology • Mixture of large- scale and punctate labels “closed” “open” “intermediate”
  • 38. segment label long RNA-seq cytosol short RNA-seq whole cell CAGE nucleus + strand hidden random variable conditional relationship observed random variable discrete continuous
  • 39. segment label long RNA-seq cytosol short RNA-seq whole cell CAGE nucleus + strand – strand hidden random variable conditional relationship observed random variable discrete continuous
  • 42. Adding evolution segment Ptro nucleotide Ptro dinucleotide Hsap nucleotide H3K27me3 Hsap dinucleotide
  • 43. Adding variation segment YRI nucleotide YRI dinucleotide ancestral nucleotide YRI DAF ancestral dinucleotide
  • 45.
  • 46.
  • 47. Acknowledgments The Hoffman Lab Jeff Bilmes William Noble Max Libbrecht Funding: Princess Margaret Cancer Foundation Canadian Institutes of Health Research Canadian Cancer Society Natural Sciences and Engineering Research Council Ontario Institute for Cancer Research Ontario Ministry of Research, Innovation and Science Medicine by Design McLaughlin Centre Samantha Wilson Coby Viner Mickaël Mendez Danielle Denisko Chang Cao Lee Zamparo Eric Roberts Mehran Karimzadeh Francis Nguyen Rachel Chan Matthew McNeil Natalia Mukhina
  • 48. Postdoctoral, MSc, PhD positions available in my research lab at the Princess Margaret Cancer Centre Dept of Medical Biophysics Dept of Computer Science University of Toronto Please approach me for details. Michael Hoffman https://hoffmanlab.org/ michael.hoffman@utoronto.ca @michaelhoffman

Editor's Notes

  1. - need laser pointer - turn Workrave off turn phone off turn iPad off Happy to answer questions in middle of talk except…
  2. For example, geographical maps have street data…buildings data…vegetation data…all resulting in an integrated map
  3. Semi-automated genomic annotation begins with pattern discovery from multiple genomic data sets and results in: A simple annotation with a single label for each part of the genome using these patterns We can use this annotation to visualise a huge number of datatracks, eg viewing the resulting patterns found and the annotation instead of looking at each datatrack individually We can interpret the context and potential impact of the results, for example the meaning of the patterns and annotation we found. Some questions that semi-automated annotation can answer include: Pattern discovery: What signals from multiple experiments do we see over and over again? Annotation: What does a particular piece of the genome do, in a nutshell? Visualization: How can we make complex data comprehensible visually? Interpretation: What is the context and potential impact of the results we are finding?
  4. To perform genomic segmentation, Segway first ‘observes’ multiple datasets of genomic data.
  5. Then, it splits the datasets up into non-overlapping segments.
  6. To each segment, Segway assigns a label from a finite set
  7. Segway then maximizes the similarity between segments of the same label by pushing around the boundaries of the segments. These are our ‘learned patterns’
  8. For example, here we have a 0-label with a low-high-low pattern
  9. 1-label with a high-low-high pattern
  10. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  11. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  12. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  13. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  14. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  15. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  16. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  17. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  18. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  19. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  20. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  21. DBNs can be thought of as a generalization of HMMs. This particular DBN is just another representation of an HMM. The DBN diagram itself does not contain the state transition information in an HMM diagram, but it is contained in the transition conditional probability table.
  22. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  23. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  24. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  25. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  26. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  27. A hidden Markov multimodel: “no longer a HMM, not yet a DBN” dashed line = switching relationship The default Segway model for three observations.
  28. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  29. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data) Discuss mixture model
  30. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  31. light blue fill = semiobserved random variable (observed random variable for the data switched by a discrete observed random variable for the nonmissingness of the data)
  32. And finally, I'd like to thank you for your very kind attention.