SlideShare a Scribd company logo
1 of 69
Download to read offline
Advanced Computing Laboratory
Electrical and Computer Engineering
Seoul National University
Taehoon Lee SungrohYoon
BoostedCategorical Restricted Boltzmann Machine
forComputational Prediction of Splice Junctions
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
2/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
negative positive
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
• We also devise a regularization term for sparsity of DNA sequences.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
exon
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
exon
intron
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
exon
intron
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
GT (or AG)
16K
76M
true sites
exon
intron
160K
(=0.21% over 76M)
4/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
1
2
5/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
5/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
We propose a learning model based on (multilayer) RBMs
and its training scheme.
5/25
• Training methods of RBM
• RBM for categorical values
• Softmax input units (Salakhutdinov et al., ICML 2007).
• Class-imbalance problems
• Refer to a review byGalar et al. (IEEET SMC 2012).
Related Methodologies
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
6/25
Main Contributions
7/25
Main Contributions
New RBM training methods
called boosted CD
7/25
Main Contributions
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
25/25
The ability to detect subtle
non-canonical splicing signals
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
8/25
• RBM is a type of logistic belief network whose structure is a bipartite graph.
• Nodes:
• Input layer:
• Hidden layer:
• Probability of a configuration :
•
•
• Each node is a stochastic binary unit:
•
• can be used as a feature.
Restricted Boltzmann Machines
9/25
• Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
𝒗(0)
= 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1)
𝒗(𝑘)
10/25
• Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
approximated by
k-step Markov chain
𝒗(0)
= 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1)
𝒗(𝑘)
10/25
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
11/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
• Boosting is a meta-algorithm which converts weak learners to strong ones.
• Most boosting algorithms consist of iteratively learning weak classifiers with
respect to a distribution and adding them to a final strong classifier.
• The main variation between many boosting algorithms:
• The method of weighting training data points and hypotheses.
• AdaBoost, LPBoost,TotalBoost, …
What Boosting Is
from lecture notes @ UCIrvine CS 271 Fall 2007
13/25
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
BoostedContrastive Divergence (1/2)
14/25
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign higher
weights to
rare samples
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign lower
weights to
ordinary samples
assign higher
weights to
rare samples
14/25
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by PT
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
Categorical Gradient
16/25
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
16/25
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
16/25
reconstruction with and w/o
the sparsity term
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
16/25
reconstruction with and w/o
the sparsity term
derived from
the sparsity term
ProposedTraining Algorithm
categorical gradient
boosted CD
17/25
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
18/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
• GWH dataset: 2-class (boundary or not).
• UCSC dataset: 3-class (acceptor, donor, or non-boundary).
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
19/25
• The proposed method shows the best performance in terms of reconstruction
error for both training and testing.
• Compare to the softmax approach, the proposed regularized RBM succeeds in
achieving lower error by slightly sacrificing the probability sum constraint.
Results: Effects ofCategorical Gradient
Data: chromosome 19 in
GWH-donor
Sequence Length: 200nt
(800 dimension)
# of iterations: 500
Learning rate: 0.1
L2-decay: 0.001
over-fitted best
20/25
• For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
• For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
Proposed boosted CD Reweighting samples -
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
22/25
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths
22/25
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths Robustness to negative samples
22/25
exon intron
• (Important biological finding) non-canonical splicing can arise if:
• Introns containGCA or NAA sequences at their boundaries.
• Exons include contiguousA’s around the boundaries.
Results: Identification of Non-Canonical Splice Sites
We used 162,951
examples excluding
canonical splice sites.
23/25
• We proposed a new RBM training method called boosted CD with categorical
gradients that improves conventionalCD for class-imbalanced data.
• Significant boosts in splicing prediction in terms of accuracy and runtime.
• Increased robustness to high-dimensional class-imbalanced data.
• The proposed scheme shows the ability to detect subtle non-canonical
splicing signals that often could not be identified by traditional methods.
• Future work: additional validation using various class-imbalance datasets.
24/25
Conclusion
• Our lab members
• Financial supports
• ICML 2015 travel scholarship
Acknowledgements
June 2, 2015
25/25
• Our lab members
• Financial supports
• ICML 2015 travel scholarship
Acknowledgements
June 2, 2015
25/25
• The proposed DBN showed xx% higher performance in terms of the F1-score.
• RNN is appropriate for sequence modeling. However, splicing signals are often
too far from the boundaries and hard to maintain splicing information.
Backup:Comparison with Recurrent Neural Networks (RNNs)
To be placed
Backup/25

More Related Content

Similar to ICML2015 Slides

powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
imu409
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to ICML2015 Slides (20)

Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
 
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Recently uploaded (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

ICML2015 Slides

  • 1. Advanced Computing Laboratory Electrical and Computer Engineering Seoul National University Taehoon Lee SungrohYoon BoostedCategorical Restricted Boltzmann Machine forComputational Prediction of Splice Junctions
  • 2. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 2/25
  • 3. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation 3/25
  • 4. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation negative positive 3/25
  • 5. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation negative positive easy to misclassify query images 3/25
  • 6. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? Motivation negative positive easy to misclassify query images 3/25
  • 7. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? • We propose a new RBM training method called boosted CD. Motivation negative positive easy to misclassify query images 3/25
  • 8. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? • We propose a new RBM training method called boosted CD. • We also devise a regularization term for sparsity of DNA sequences. Motivation negative positive easy to misclassify query images 3/25
  • 9. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem 4/25
  • 10. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein 4/25
  • 11. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression 4/25
  • 12. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression exon 4/25
  • 13. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression exon intron 4/25
  • 14. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem exon GT: false boundary GT: true boundary ACGTCGACTGCTACGTAGCAGCGA TACGTACCGATCATCACTATCATC GAGGTACGATCGATCGATCGATCA GTCGATCGTCGTTCAGTCAGTCGA TATCAGTCATATGCACATCTCAGT DNA RNA protein gene expression exon intron 4/25
  • 15. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem exon GT: false boundary GT: true boundary ACGTCGACTGCTACGTAGCAGCGA TACGTACCGATCATCACTATCATC GAGGTACGATCGATCGATCGATCA GTCGATCGTCGTTCAGTCAGTCGA TATCAGTCATATGCACATCTCAGT DNA RNA protein gene expression GT (or AG) 16K 76M true sites exon intron 160K (=0.21% over 76M) 4/25
  • 16. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction 1 2 5/25
  • 17. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction We want to construct a learning model which can boost prediction performance in a complementary way to alignment-based method. 1 2 1 2 5/25
  • 18. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction We want to construct a learning model which can boost prediction performance in a complementary way to alignment-based method. 1 2 1 2 We propose a learning model based on (multilayer) RBMs and its training scheme. 5/25
  • 19. • Training methods of RBM • RBM for categorical values • Softmax input units (Salakhutdinov et al., ICML 2007). • Class-imbalance problems • Refer to a review byGalar et al. (IEEET SMC 2012). Related Methodologies Description Training cost Noise handling Class-imbalance handling CD (Hinton, Neural Comp. 2002) Standard and widely used - - - Persistent CD (Tieleman, ICML 2008) Use of a single Markov chain - - Parallel tempering (Cho et al., IJCNN 2010) Simultaneous Markov chains generation 6/25
  • 21. Main Contributions New RBM training methods called boosted CD 7/25
  • 22. Main Contributions New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 23. Main Contributions Significant boosts in splicing prediction performance New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 24. Main Contributions Significant boosts in splicing prediction performance Robustness to high-dimensional class-imbalanced data New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 25. Main Contributions Significant boosts in splicing prediction performance Robustness to high-dimensional class-imbalanced data New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 25/25 The ability to detect subtle non-canonical splicing signals
  • 26. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 8/25
  • 27. • RBM is a type of logistic belief network whose structure is a bipartite graph. • Nodes: • Input layer: • Hidden layer: • Probability of a configuration : • • • Each node is a stochastic binary unit: • • can be used as a feature. Restricted Boltzmann Machines 9/25
  • 28. • Training weights to minimize negative log-likelihood of data. • Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps. • The CD-𝑘 updates after seeing example 𝒗: Contrastive Divergence (CD) forTraining RBMs 𝒗(0) = 𝒗 𝒉(0) 𝒉(1) 𝒉(𝑘) 𝒗(1) 𝒗(𝑘) 10/25
  • 29. • Training weights to minimize negative log-likelihood of data. • Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps. • The CD-𝑘 updates after seeing example 𝒗: Contrastive Divergence (CD) forTraining RBMs approximated by k-step Markov chain 𝒗(0) = 𝒗 𝒉(0) 𝒉(1) 𝒉(𝑘) 𝒗(1) 𝒗(𝑘) 10/25
  • 30. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 11/25
  • 31. Overview of Proposed Methodology 12/25
  • 32. Overview of Proposed Methodology 12/25
  • 33. Overview of Proposed Methodology 12/25
  • 34. Overview of Proposed Methodology 12/25
  • 35. Overview of Proposed Methodology 12/25
  • 36. Overview of Proposed Methodology 12/25
  • 37. • Boosting is a meta-algorithm which converts weak learners to strong ones. • Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. • The main variation between many boosting algorithms: • The method of weighting training data points and hypotheses. • AdaBoost, LPBoost,TotalBoost, … What Boosting Is from lecture notes @ UCIrvine CS 271 Fall 2007 13/25
  • 38. • Contrastive divergence training is looped over all mini-batches and known to be stable. BoostedContrastive Divergence (1/2) 14/25
  • 39. • Contrastive divergence training is looped over all mini-batches and known to be stable. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 40. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 41. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 42. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) assign higher weights to rare samples 14/25 hardly observed regions
  • 43. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) assign lower weights to ordinary samples assign higher weights to rare samples 14/25 hardly observed regions
  • 44. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2)
  • 45. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 46. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by the proposed Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 47. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by PT Relative locations of samples and corresponding Markov chains by the proposed Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 48. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. Categorical Gradient 16/25
  • 49. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient 16/25
  • 50. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient sparsity term 16/25 reconstruction with and w/o the sparsity term
  • 51. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient sparsity term 16/25 reconstruction with and w/o the sparsity term derived from the sparsity term
  • 53. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 18/25
  • 54. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction 19/25
  • 55. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG 19/25
  • 56. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor 19/25
  • 57. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor false acceptor 1false donor 1 19/25
  • 58. • Data preparation: • Real human DNA sequences with known boundary information. • GWH dataset: 2-class (boundary or not). • UCSC dataset: 3-class (acceptor, donor, or non-boundary). Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor false acceptor 1false donor 1 19/25
  • 59. • The proposed method shows the best performance in terms of reconstruction error for both training and testing. • Compare to the softmax approach, the proposed regularized RBM succeeds in achieving lower error by slightly sacrificing the probability sum constraint. Results: Effects ofCategorical Gradient Data: chromosome 19 in GWH-donor Sequence Length: 200nt (800 dimension) # of iterations: 500 Learning rate: 0.1 L2-decay: 0.001 over-fitted best 20/25
  • 60. • For simulating a class- imbalance situation • we randomly dropped samples with different drop rates for different classes. Results: Effects of Boosting
  • 61. • For simulating a class- imbalance situation • we randomly dropped samples with different drop rates for different classes. Results: Effects of Boosting Description Training cost Noise handling Class-imbalance handling CD (Hinton, Neural Comp. 2002) Standard and widely used - - - Persistent CD (Tieleman, ICML 2008) Use of a single Markov chain - - Parallel tempering (Cho et al., IJCNN 2010) Simultaneous Markov chains generation Proposed boosted CD Reweighting samples -
  • 62. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime 22/25
  • 63. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime Insensitivity to sequence lengths 22/25
  • 64. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime Insensitivity to sequence lengths Robustness to negative samples 22/25
  • 65. exon intron • (Important biological finding) non-canonical splicing can arise if: • Introns containGCA or NAA sequences at their boundaries. • Exons include contiguousA’s around the boundaries. Results: Identification of Non-Canonical Splice Sites We used 162,951 examples excluding canonical splice sites. 23/25
  • 66. • We proposed a new RBM training method called boosted CD with categorical gradients that improves conventionalCD for class-imbalanced data. • Significant boosts in splicing prediction in terms of accuracy and runtime. • Increased robustness to high-dimensional class-imbalanced data. • The proposed scheme shows the ability to detect subtle non-canonical splicing signals that often could not be identified by traditional methods. • Future work: additional validation using various class-imbalance datasets. 24/25 Conclusion
  • 67. • Our lab members • Financial supports • ICML 2015 travel scholarship Acknowledgements June 2, 2015 25/25
  • 68. • Our lab members • Financial supports • ICML 2015 travel scholarship Acknowledgements June 2, 2015 25/25
  • 69. • The proposed DBN showed xx% higher performance in terms of the F1-score. • RNN is appropriate for sequence modeling. However, splicing signals are often too far from the boundaries and hard to maintain splicing information. Backup:Comparison with Recurrent Neural Networks (RNNs) To be placed Backup/25