SlideShare a Scribd company logo
Seoul National University
Advanced Computing Laboratory
Taehoon Lee
Robust Feature Learning
with Deep Neural Networks
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
2/81
ResearchAreas
Deep neural networks are able to learn hierarchical representations.
Theory Image
Time series Bioinformatics
Machine Learning
Deep Learning
• Main theories: machine learning, deep learning, statistical learning
• Main applications: computer vision, bioinformatics
• Main skills: parallel computing
3/81
• Byunghan Lee, Taehoon Lee, andSungroh Yoon,"DNA-Level Splice Junction Prediction using Deep Recurrent Neural Networks," in Proceedings
of NIPS Workshop on Machine Learning in Computational Biology, Montreal, Canada, December 2015.
• Seungmyung Lee, Hanjoo Kim, Siqi Tan, Taehoon Lee, Sungroh Yoon, and Rhiju Das, "Automated band annotation for RNA structure probing
experiments with numerous capillary electrophoresis profiles," Bioinformatics, vol. 31, no. 17, pp. 2808-2815, September 2015.
• Taehoon Lee and Sungroh Yoon, "Boosted Categorical Restricted Boltzmann Machine for Computational Prediction of Splice Junctions," in
Proceedings of International Conference on Machine Learning (ICML), Lille, France, July 2015.
• Donghyeon Yu, Joong-Ho Won, Taehoon Lee, Johan Lim, and Sungroh Yoon,"High-dimensional Fused Lasso Regression using Majorization-
Minimization and Parallel Processing," Journal of Computational and Graphical Statistics, vol.24, no.1, pp. 121-153, March 2015.
• Taehoon Lee, Sungmin Lee, Woo Young Sim, Yu MiJung, Sunmi Han, Chanil Chung, Jay Junkeun Chang, Hyeyoung Min,and Sungroh Yoon,
"Robust Classification of DNA Damage Patterns in Single Cell Gel Electrophoresis," in Proceedings of 35th Annual International Conference of the
IEEE Engineering in Medicine andBiology Society (EMBC),Osaka, Japan, July 2013.
• Taehoon Lee, Hyeyoung Min,Seung Jean Kim, and Sungroh Yoon, "Application of maximin correlation analysis to classifying protein
environments for function prediction," Biochemical and Biophysical Research Communications, vol. 400, no. 2, pp. 219-224, September 2010.
• Hyeyoung Min,Seunghak Yu, Taehoon Lee, and Sungroh Yoon, "Support vector machine based classification of 3-dimensional protein
physicochemical environments for automated function annotation," Archives of Pharmacal Research, vol. 33, no. 9, pp. 1451-1459,September
2010.
• Taehoon Lee, Seung Jean Kim, Eui-Young Chung, andSungroh Yoon, "K-maximin Clustering: A Maximin Correlation Approach to Partition-Based
Clustering, " IEICE Electronics Express, vol. 6, no. 17, pp. 1205-1211, September 2009.
• Taehoon Lee, Taesup Moon,Seung Jean Kim, and Sungroh Yoon,"Regularization and Kernelization of the Maximin Correlation Approach" (under
review)
• Taehoon Lee, Minsuk Choi, and Sungroh Yoon, "Manifold Regularized Deep Networks using Adversarial Examples" (under review)
• Taehoon Lee, Joong-Ho Won, Johan Lim, and Sungroh Yoon,"Large-scale Fused Lasso on multi-GPU using FFT-Based Split Bregman Method"
(under review)
• Taehoon Lee et al., "HiComet: High-Throughput Comet Analysis Tool for Large-Scale DNA Damage Assessment Studies" (in preparation)
Publications
• 게재 완료: SCI급 저널 5편, 학술대회 논문 3편 (제 1저자 총 4편)
• 심사 중: SCI급 저널 3편, 학술대회 논문 1편 (모두 제 1저자)
• 국내 저널 및 학회: 12편 (제 1저자 6편)
4/81
Research Experience
5/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
6/81
• Deep Neural Network (DNN) learns effective hierarchical representation.
• DNN learns automatically representations and features from data.
What Do Deep Neural Networks Learn
object
↑
part
↑
motif
↑
Edge
Image
story
↑
sentence
↑
clause
↑
word
Language
word
↑
phoneme
↑
phone
↑
Sound
Speech
output
input
Hand-crafted program
Hand-crafted features
Trainable features
Trainable classifier
Trainable classifier
tiger
Traditional machine learning
Deep learning
Rule-based systems
higher level of
abstraction
7/81
3 × 2 + 3 × 5 + 3 × 7 → 3 × (2 + 5 + 7)
• As the number of layers goes larger, the effect of factorization gets higher.
• Factorization is the decomposition of an object into a product of factors.
Why Do Deep Neural NetworksWork SoWell
𝑥 𝑦
𝑊(1)
𝑊(2)
𝑥 𝑦
𝑊(1)
𝑊(2)
𝑊(3)
𝑊(4)
The more number of paths with the same number of weight values
shallow deep
Many data, complex models, various priors, and high-end
hardware altogether are enabling deep learning prosper.
8/81
History ofArtificial Neural Networks
Minsky and Papert, 1969
“Perceptrons”
(Limits of Perceptrons) [M69]
Rosenblatt, 1958
Perceptron [R58]
Fukushima, 1980
NeoCognitron
(Convolutional NN) [F80]
Hinton, 1983
Boltzmann
machine [H83]
Fukushima, 1975
Cognitron (Autoencoder) [F75]
Hinton, 1986
RBM, Restricted
Boltzmann machine [H86]
Hinton, 2006
Deep Belief
Networks [H06]
(mid 1980s)
Back-propagation
Early Models
Basic Models
Break
through
Le, 2012
Training of 1 billion
parameters [L12]
Lee, 2009
Convolutional
RBM [L09]
LeCun, 1998
Revisit of CNN [L98]
http://www.technologyrevi
ew.com/featuredstory/5136
96/deep-learning/
9/81
Deep LearningTechniques
Regularization helps the network avoid get over-fitted.
dropout
parameter
sharing
(CNN, RNN)
early stopping
weight decay
sparse
connectivity
exploiting
sparsity
traditionaltrendy
• Deconv nets
(Zeiler et al., CVPR 2010)
• Normalized initialization
(Glorot et al., AISTATS 2010)
• DropConnect
(Wan et al., ICML 2013)
• Batch normalization
(Loffe et al., ICML 2015)
• Inception
(Szegedy et al., CVPR 2015)
• Adversarial training
(Goodfellow et al., ICLR 2015)
LeCun et al., Proc.
IEEE 1998Srivastava et al.,
JMLR 2014
Baidu
10/81
Applications of Deep Learning
Natural Language Understanding
Natural Image Understanding
from Karpathy et al., NIPS 2014.
from Google I/O 2013 Highlights
Speech
Recognition
Image Recognition
Natural
Language
Processing
output sentence
current main applications rising applications
11/81
• RBM is a type of logistic belief network whose structure is a bipartite graph.
• Nodes:
• Input layer:
• Hidden layer:
• Probability of a configuration :
•
•
• Each node is a stochastic binary unit:
•
• can be used as a feature.
Restricted Boltzmann Machines
12/81
• CNN is a type of feed-forward artificial neural network where the individual
neurons respond to overlapping regions in the visual field.
• Key components are convolutional and subsampling layers.
Convolutional Neural Networks
LeCun et al., Proc. IEEE 1998.
C-layer
Convolution
between a kernel
and an image to
extract features.
S-layer
Aggregation of
the statistics of
local features at
various locations.
13/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
14/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
1
2
3
15/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
1
2
3
16/81
• As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
Dissertation Overview
17/81
• As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
18/81
• As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
• Second, training restricted Boltzmann machines showed
limited performance for sampling for minority samples in
class-imbalanced dataset.
19/81
• As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
• Second, training restricted Boltzmann machines showed
limited performance for sampling for minority samples in
class-imbalanced dataset.
• Lastly, spatial dependency handling needs to be more
complicated while convolutional neural networks are known
as well learning technique for handling of spatial dependency.
20/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
21/81
• Desired behaviors and practical issues of deep learning and manifold learning:
• Deep learning discriminates different classes; however, it may result in
wiggly boundaries vulnerable to adversarial perturbations.
• Manifold learning preserves geodesic distances; however, it may result in
poor embedding.
Motivation
22/81
Szegedy et al, Intriguing Properties of Neural Networks, ICLR 2014.
Goodfellow et al, Explaining and HarnessingAdversarial Examples, ICLR 2015.
• We can generate an adversarial input 𝑥 𝑎𝑑𝑣 = 𝑥 + ∆𝑥.
• We expect the classifier to assign the same class to 𝑥 and 𝑥 𝑎𝑑𝑣 so long as
∆𝑥 ∞ < 𝜖.
• However, very small perturbation can misclassify correct images.
Adversarial Example
adversarial
example
original
example
small
perturbation
Goodfellow, ICLR 2015.
fooling networks
23/81
• Consider the dot product between a weight vector w and an adversarial
example 𝑥 𝑎𝑑𝑣:
• The adversarial perturbation causes the activation to grow by 𝑤 𝑇∆𝑥.
• We can maximize this increase subject to max norm constraint on ∆𝑥 by
assigning ∆𝑥 = sign(𝑤).
HowCanWe Fool Neural Networks?
𝑤 𝑇 𝑥 𝑎𝑑𝑣 = 𝑤 𝑇 𝑥 + 𝑤 𝑇∆𝑥
𝑥 𝑎𝑑𝑣 = 𝑥 − 𝜀𝑤 if 𝑥 is positive
𝑥 𝑎𝑑𝑣 = 𝑥 + 𝜀𝑤 if 𝑥 is negative
𝑤 = [8.28, 10.03]𝑥
24/81
Nguyen et al, Deep Neural Networks are Easily Fooled: HighConfidence
Predictions for Unrecognizable Images, CVPR 2015.
• We can maximize this increase subject to max norm constraint on ∆𝑥 by
assigning ∆𝑥 = 𝜀(𝛻𝑥 𝐽(𝜃, 𝑥, 𝑦)).
• We can also fool neural network by using following evolutionary algorithm.
Deep Neural NetworksCan BeAlso Fooled
25/81
• Adversarial examples can be explained as a property of high-dimensional
dot products.
• The direction of perturbation, rather than the specific point in space, matters
most. Space is not full of pockets of adversarial examples that finely tile the
reals like the rational numbers.
• Because it is the direction that matters most, adversarial perturbations
generalize across different clean examples.
• Linear models lack the capacity to resist adversarial perturbation; only
structures with a hidden layer (where the universal approximator theorem
applies) should be trained to resist adversarial perturbation.
Important Observations (Szegedy et al, ICLR 2014)
26/81
• How can we cover adversarial examples?
• Simply train all the noisy examples (Loosli et al., LargeScale Kernel
Machines 2007: INFINITE MNIST dataset).
• Exponential cost
• Include the adversarial term in the objective function (Goodfellow et al.,
ICLR 2015).
• 𝐽 𝜃, 𝑥, 𝑦 = 𝛼 𝐽 𝜃, 𝑥, 𝑦 + 1 − 𝛼 𝐽(𝜃, 𝑥 𝑎𝑑𝑣, 𝑦)
• 1.14% -> 0.77% error rate on test 10000 examples
• Commonly, people expect that elastic distortion can resist adversarial
examples.
RelatedWork
27/81
What is Manifold
In case of closed manifold,
we may represent it
in higher dimension
more than original one.
http://www.lib.utexas.edu/maps/world_maps/world_rel_803005AI_2003.jpg
In real world, many of observations organize manifol
d.That is reason why we are learning manifold.The
picture are 2-d manifold and 3-d manifold.
28/81
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
Manifold RegularizationTerm
𝒂(1): input representation
𝒂(5): manifold representation
𝒂(6)
: softmax layer
29/81
Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂 𝒚
(1)
𝒂 𝒙
(1)
𝒂(5): manifold representation
𝒂 𝒚
(5)
𝒂 𝒙
(5)
30/81
Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂(5): manifold representation
𝒂′ 𝒏
(5)
𝒂 𝒏
(5)
𝒙′ 𝒏
𝒙 𝒏
31/81
Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂′ 𝒏
(5)
𝒂 𝒏
(5)
𝒂(5): manifold representation
𝒙′ 𝒏
𝒙 𝒏
+𝜷(𝜵 𝒙 𝒏
𝑳(𝜽; 𝒙 𝒏, 𝒚 𝒏))
32/81
• The proposed methodology learns both classifier and manifold embedding
that is robust for adversarial perturbations.
• Forward and backward operations of MRnet:
• The first forward operation is the same as in a standard neural network.
• The following backward 𝑎𝑑𝑣 is the same as the standard back-propagation
except that an adversarial perturbation.
Proposed Regularized Networks
33/81
• Three datasets we tested:
• (a) MNIST
• (b, c)The rawdata and its normalized version (LCN) ofCIFAR-10
• (d, e)The rawdata and its normalized version (ZCA) of SVHN
Experimental Results
(Krizhevsky et al., 2009)
(LeCun et al., 1998)
(Netzer et al., 2011)
34/81
• We chose 𝛽 in the range that did not violate class information.
• (a-c) Distributions of Euclidean distances between training samples on
individual datasets.
• (d-f) Different perturbation levels on individual datasets.
Generation ofAdversarial Examples
35/81
MNIST Results
Bar: statistics of 10 runs.
Circle: single run reported
in literatures.
• Fully connected models have two hidden layers.
• Convolutional models have more than two
convolutional layers.
• All the results are without data augmentation.
• The proposed model shows the best
performance among the alternatives.
36/81
CIFAR-10 and SVHN Results
37/81
• Data:CIFAR-10 test set.
• (a) Pairwise distance matrix of a(L) without Φ.
• (b) 2-D visualization of the manifold embedding through t-SNE without Φ.
• (c)Query images and top 10 nearest images without Φ.
• (d-f) Pairwise distance matrix, t-SNE plot, and query images with Φ.
Embedding Results
38/81
• We have proposed a novel methodology, unifying deep learning and manifold
learning, called manifold regularized networks (MRnet).
• We tested MRnet and confirmed its improved generalization performance
underpinned by the proposed manifold loss term on deep architectures.
• By exploiting the characteristics of blind spots, the proposed MRnet can be
extended to the discovery of true representations on manifolds in various
learning tasks.
Summary ofTopic 1
39/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
40/81
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
• We also devise a regularization term for sparsity of DNA sequences.
Motivation
negative positive
easy to
misclassify
query images
41/81
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
GT (or AG)
16K
76M
true sites
exon
intron
160K
(=0.21% over 76M)
42/81
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
We propose a learning model based on (multilayer) RBMs
and its training scheme.
43/81
• Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
approximated by
k-step Markov chain
𝒗(0) = 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1) 𝒗(𝑘)
44/81
• Boosting is a meta-algorithm which converts weak learners to strong ones.
• Most boosting algorithms consist of iteratively learning weak classifiers with
respect to a distribution and adding them to a final strong classifier.
• The main variation between many boosting algorithms:
• The method of weighting training data points and hypotheses.
• AdaBoost, LPBoost,TotalBoost, …
What Boosting Is
from lecture notes @ UCIrvine CS 271 Fall 2007
45/81
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign lower
weights to
ordinary samples
assign higher
weights to
rare samples
hardly
observed
regions
46/81
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by PT
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
47/81
Relationship between Boosting and Importance Sampling
Importance Sampling Boosted CD
target distribution f
proposal distribution g
(a)
(b)
(c)
(a) Samples cannot be drawn conveniently from 𝑓
(b)The importance sampler draws samples from 𝑔
(c) A sample of 𝑓 is obtained by multiplying 𝑓/𝑔
1. Samples are drawn from 𝑔.
2. A sample of 𝑓 is obtained by multiplying α.
Correspondingly,
48/81
• Balance equations:
• a set of equations that can always be solved to give the equilibrium
distribution of a Markov chain (when such a distribution exists).
• For a restricted Boltzmann machine (Im et al., ICLR 2015):
• For a restricted Boltzmann machine with boosted CD:
• On the convergence properties of contrastive divergence (Sutskever et al., AISTATS 2010):
• “TheCD update is not the gradient of any objective function.”; “The CD update
is shown to have at least one fixed point when used with L2 regularization.”
Balance Equations for Restricted Boltzmann Machine
global balance
(or full balance)
local balance
(or detailed balance)
Boosted contrastive divergence inherited the
properties of contrastive divergence.
49/81
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
reconstruction with and w/o
the sparsity term
derived from
the sparsity term
50/81
ProposedTraining Algorithm
categorical gradient
boosted CD
51/81
• For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
Proposed boosted CD Reweighting samples - 52/81
• Data preparation:
• Real human DNA sequences with known boundary information.
• GWH dataset: 2-class (boundary or not).
• UCSC dataset: 3-class (acceptor, donor, or non-boundary).
Experimental Setup for Junction Prediction
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
53/81
• The proposed method shows the best performance in terms of reconstruction
error for both training and testing.
• Compare to the softmax approach, the proposed regularized RBM succeeds in
achieving lower error by slightly sacrificing the probability sum constraint.
Results: Effects ofCategorical Gradient
Data: chromosome 19 in
GWH-donor
Sequence Length: 200nt
(800 dimension)
# of iterations: 500
Learning rate: 0.1
L2-decay: 0.001
over-fitted best
54/81
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths Robustness to negative samples
55/81
exon intron
• (Important biological finding) non-canonical splicing can arise if:
• Introns containGCA or NAA sequences at their boundaries.
• Exons include contiguousA’s around the boundaries.
Results: Identification of Non-Canonical Splice Sites
We used 162,951
examples excluding
canonical splice sites.
56/81
Summary ofTopic 2
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
The ability to detect subtle
non-canonical splicing signals57/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
58/81
• In this paper, we consider the fused Lasso regression (FLR), an important
special case of the ℓ1-penalized regression for structured sparsity:
• The matrix 𝐷 is the difference matrix on the undirected and unweighted
graph of adjacent variables.
• Adjacency of the variables is determined by the application.
• For graphs with 2-D grid , the objective function can be written as
• The second penalty function is non-smooth and non-separable.
Fused Lasso Regression
59/81
• We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
fused Lasso
60/81
• We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
61/81
• We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
accelerating for solving a linear system
fused Lasso + split Bregman algorithm + PCGLS
62/81
• We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
accelerating for solving a linear system
fused Lasso + split Bregman algorithm + PCGLS
replacing a linear system solver with FFT
fused Lasso + split Bregman algorithm + PCGLS + FFT
63/81
• Split Bregman algorithm for the ℓ1-norm:
• Because of the ℓ1-norm, the objective function is non-differentiable.
Split BregmanAlgorithm for Fused Lasso
introducing an auxiliary variable
approximating
64/81
• The conjugate gradient (CG) method aims to solve the linear system of
equations for the form 𝐴𝑥 = 𝑏 iteratively when 𝐴 is symmetric and positive
definite.
PCGLSAlgorithm
• For the least squared problems, it is
well-known that (9) is equivalent to
solving the normal equation
𝑥 = (𝐴 𝑇 𝐴)−1 𝐴 𝑇 𝑏.
• TheCG algorithm for least squares is
often referred to as theCGLS, and its
preconditioned counterpart as the
PCGLS (in this case the scaling
amounts to 𝐴 𝑇 𝐴 -> 𝑀−𝑇 𝐴 𝑇 𝐴𝑀−1).
acceleratable
65/81
• In mathematics, Poisson's equation is a partial differential equation of elliptic
type with broad utility in electrostatics, mechanical engineering and theoretical
physics.
• Poisson’s equation is frequently written as
Poisson’s Equation
http://en.wikipedia.org/wiki/Poisson's_equation
http://people.rit.edu/~pnveme/ExplictSolutions2/2Dim/Linear/PoissonDisk/PoissonDisk.html
66/81
• In two-dimensional Cartesian coordinates, it takes the form
Poisson’s Equation in 2-Dimensions
block tri-diagonal system
67/81
• Mathematical background
• Apply 2D forward FFT to 𝑓 to obtain 𝑓(𝑘), where 𝑘 is the wave number
• Apply the inverse of the Laplace operator to 𝑓(𝑘) to obtain 𝑣(𝑘): simple
element-wise division in Fourier space
• Apply 2D inverse FFT to 𝑣(𝑘) to obtain 𝑣
Poisson’s Equation using the FFT
𝑣 = −
𝑓
(𝑘 𝑥
2
+ 𝑘 𝑦
2
)
𝛻2
𝑣 = 𝑓 ↔ −(𝑘 𝑥
2
+ 𝑘 𝑦
2
)𝑣 = 𝑓
http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/3-CUDA_libraries_+_Matlab.pdf
68/81
• Pseudo codes for two iterative methods:
Split BregmanAlgorithm for Fused Lasso (1/2)
FFT
69/81
• Multi-GPU operations for matrix-vector
computations
Split BregmanAlgorithm for Fused Lasso (2/2)
70/81
• The computation times are measured inCPU time with
• CPU: Intel Xeon E5-4620 (2.2GHz) and 16GB RAM
• GPU: NVIDIAGTXTitan (2688 cores, 6GBGDDR5)
• We set the regularization parameters 𝜆1, 𝜆2 = 1,1 and stopping criterion is
• We generate 𝑛 samples from a 𝑝-dimensional 𝑁(0, 𝐼 𝑝) and the response
variable y is generated by using 𝑦 = 𝑋𝛽 + 𝜖 (𝑁(0, 𝐼 𝑛)) where 𝛽 = .
Experiments
71/81
• We first considered scenarios with synthetic regression problems where the
coefficients were defined on a square grid:
• For the very large cases, the average speed-up: 409.19 to 433.23
Runtime Comparison for PiecewiseConstant BlocksCases
72/81
• For the other cases (n = 12000–24000), the average speed-up: 26.67–47.47
• CircularGaussian cases are formulated by:
Runtime Comparison forCircularGaussian Cases
73/81
• Image-based regression of the behavioral fMRI data.
• Regression coefficients were overlaid and color-coded on the brain map as
described in the text.
Structured Sparsity Regression Example
74/81
• Image-based regression of the behavioral fMRI data.
• Regression coefficients were overlaid and color-coded on the brain map as
described in the text.
Structured Sparsity Regression Example
75/81
• By applying the proposed method to various large-scale datasets extensively,
we have demonstrated successfully the following:
• Feasibility of highly-parallelizable computational algorithms for high-
dimensional structured sparse regression problems,
• Use case of direct-communicating multiple GPUs for speed-up and
scalability,
• Promise of FFT-based preconditioners for parallel solving of a family of
linear systems.
• That the highest (433x) speed up occurred at the highest dimensional problems
clearly indicates where the merit of the multi-GPU scheme lies.
• Future work: connecting dots to deep neural networks
• FusedAutoencoder, Multi-layer fused Lasso, …
Summary ofTopic 3
76/81
• Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
77/81
1. The MRnet can be applied in a complementary way to generalize neural
networks with traditional techniques such as L2 decay.
2. We propose a novel method for training RBMs for class-imbalanced
prediction. Our proposal includes a deep belief network-based methodology
for computational splice junction prediction.
3. The parallel fused Lasso can be applied for data that have structured
sparsity like images to exploit more prior knowledge than convolutional or
recurrent operations.
Conclusion
This dissertation proposed a set of robust feature learning schemes
that can learn meaningful representation underlying in large-scale
genomic datasets and image datasets using deep networks.
1
2
3
78/81
• Several future work for the proposed methodologies can be possible.
• First, we can extend MRnet to extract scaling and translation invariant features
by replacing synthetic of nearest training samples.
• Second, it can be also interesting to alternate the objective function of MRnet
in order to generalize the whole procedure of MRnet.
• Lastly, the proposed three schemes (manifold loss, boosting, and L1 fusion
penalty) can be applied into the framework of recurrent neural networks.
Limitations and FutureWork
We need to make the proposed schemes
more universal and general.
79/81
Acknowledgements
80/81
Acknowledgements
81/81

More Related Content

What's hot

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Praxitelis Nikolaos Kouroupetroglou
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
Edge AI and Vision Alliance
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
Joonhyung Lee
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
KONGU ENGINEERING COLLEGE
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
Poo Kuan Hoong
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Lalit Jain
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
Joseph Paul Cohen PhD
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
Mohamed Loey
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
Melanie Swan
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
WonjoongCheon
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
Machine learning
Machine learningMachine learning
Machine learning
Rajib Kumar De
 

What's hot (20)

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar to PhD Defense

Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
Joe Xing
 
Deep learning 1
Deep learning 1Deep learning 1
Deep learning 1
Karthick Thiyagu
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
ssuser4b1f48
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
Jon Lederman
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
Iwan Sofana
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
Poo Kuan Hoong
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Abhishek Bhandwaldar
 
fundamentals-of-neural-networks-laurene-fausett
fundamentals-of-neural-networks-laurene-fausettfundamentals-of-neural-networks-laurene-fausett
fundamentals-of-neural-networks-laurene-fausett
Zarnigar Altaf
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
Ramesh Kumar
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
Neural Networks-1
Neural Networks-1Neural Networks-1
Neural Networks-1
Sai Kumar Dwivedi
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
Lidia Pivovarova
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
Poo Kuan Hoong
 
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
cscpconf
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
Claudio Gallicchio
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
Namkug Kim
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
Artificial Neural Network and its Applications
Artificial Neural Network and its ApplicationsArtificial Neural Network and its Applications
Artificial Neural Network and its Applications
shritosh kumar
 
NEURAL NETWORKS
NEURAL NETWORKSNEURAL NETWORKS
NEURAL NETWORKSESCOM
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 

Similar to PhD Defense (20)

Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Deep learning 1
Deep learning 1Deep learning 1
Deep learning 1
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
fundamentals-of-neural-networks-laurene-fausett
fundamentals-of-neural-networks-laurene-fausettfundamentals-of-neural-networks-laurene-fausett
fundamentals-of-neural-networks-laurene-fausett
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Neural Networks-1
Neural Networks-1Neural Networks-1
Neural Networks-1
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Artificial Neural Network and its Applications
Artificial Neural Network and its ApplicationsArtificial Neural Network and its Applications
Artificial Neural Network and its Applications
 
NEURAL NETWORKS
NEURAL NETWORKSNEURAL NETWORKS
NEURAL NETWORKS
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

PhD Defense

  • 1. Seoul National University Advanced Computing Laboratory Taehoon Lee Robust Feature Learning with Deep Neural Networks
  • 2. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 2/81
  • 3. ResearchAreas Deep neural networks are able to learn hierarchical representations. Theory Image Time series Bioinformatics Machine Learning Deep Learning • Main theories: machine learning, deep learning, statistical learning • Main applications: computer vision, bioinformatics • Main skills: parallel computing 3/81
  • 4. • Byunghan Lee, Taehoon Lee, andSungroh Yoon,"DNA-Level Splice Junction Prediction using Deep Recurrent Neural Networks," in Proceedings of NIPS Workshop on Machine Learning in Computational Biology, Montreal, Canada, December 2015. • Seungmyung Lee, Hanjoo Kim, Siqi Tan, Taehoon Lee, Sungroh Yoon, and Rhiju Das, "Automated band annotation for RNA structure probing experiments with numerous capillary electrophoresis profiles," Bioinformatics, vol. 31, no. 17, pp. 2808-2815, September 2015. • Taehoon Lee and Sungroh Yoon, "Boosted Categorical Restricted Boltzmann Machine for Computational Prediction of Splice Junctions," in Proceedings of International Conference on Machine Learning (ICML), Lille, France, July 2015. • Donghyeon Yu, Joong-Ho Won, Taehoon Lee, Johan Lim, and Sungroh Yoon,"High-dimensional Fused Lasso Regression using Majorization- Minimization and Parallel Processing," Journal of Computational and Graphical Statistics, vol.24, no.1, pp. 121-153, March 2015. • Taehoon Lee, Sungmin Lee, Woo Young Sim, Yu MiJung, Sunmi Han, Chanil Chung, Jay Junkeun Chang, Hyeyoung Min,and Sungroh Yoon, "Robust Classification of DNA Damage Patterns in Single Cell Gel Electrophoresis," in Proceedings of 35th Annual International Conference of the IEEE Engineering in Medicine andBiology Society (EMBC),Osaka, Japan, July 2013. • Taehoon Lee, Hyeyoung Min,Seung Jean Kim, and Sungroh Yoon, "Application of maximin correlation analysis to classifying protein environments for function prediction," Biochemical and Biophysical Research Communications, vol. 400, no. 2, pp. 219-224, September 2010. • Hyeyoung Min,Seunghak Yu, Taehoon Lee, and Sungroh Yoon, "Support vector machine based classification of 3-dimensional protein physicochemical environments for automated function annotation," Archives of Pharmacal Research, vol. 33, no. 9, pp. 1451-1459,September 2010. • Taehoon Lee, Seung Jean Kim, Eui-Young Chung, andSungroh Yoon, "K-maximin Clustering: A Maximin Correlation Approach to Partition-Based Clustering, " IEICE Electronics Express, vol. 6, no. 17, pp. 1205-1211, September 2009. • Taehoon Lee, Taesup Moon,Seung Jean Kim, and Sungroh Yoon,"Regularization and Kernelization of the Maximin Correlation Approach" (under review) • Taehoon Lee, Minsuk Choi, and Sungroh Yoon, "Manifold Regularized Deep Networks using Adversarial Examples" (under review) • Taehoon Lee, Joong-Ho Won, Johan Lim, and Sungroh Yoon,"Large-scale Fused Lasso on multi-GPU using FFT-Based Split Bregman Method" (under review) • Taehoon Lee et al., "HiComet: High-Throughput Comet Analysis Tool for Large-Scale DNA Damage Assessment Studies" (in preparation) Publications • 게재 완료: SCI급 저널 5편, 학술대회 논문 3편 (제 1저자 총 4편) • 심사 중: SCI급 저널 3편, 학술대회 논문 1편 (모두 제 1저자) • 국내 저널 및 학회: 12편 (제 1저자 6편) 4/81
  • 6. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 6/81
  • 7. • Deep Neural Network (DNN) learns effective hierarchical representation. • DNN learns automatically representations and features from data. What Do Deep Neural Networks Learn object ↑ part ↑ motif ↑ Edge Image story ↑ sentence ↑ clause ↑ word Language word ↑ phoneme ↑ phone ↑ Sound Speech output input Hand-crafted program Hand-crafted features Trainable features Trainable classifier Trainable classifier tiger Traditional machine learning Deep learning Rule-based systems higher level of abstraction 7/81
  • 8. 3 × 2 + 3 × 5 + 3 × 7 → 3 × (2 + 5 + 7) • As the number of layers goes larger, the effect of factorization gets higher. • Factorization is the decomposition of an object into a product of factors. Why Do Deep Neural NetworksWork SoWell 𝑥 𝑦 𝑊(1) 𝑊(2) 𝑥 𝑦 𝑊(1) 𝑊(2) 𝑊(3) 𝑊(4) The more number of paths with the same number of weight values shallow deep Many data, complex models, various priors, and high-end hardware altogether are enabling deep learning prosper. 8/81
  • 9. History ofArtificial Neural Networks Minsky and Papert, 1969 “Perceptrons” (Limits of Perceptrons) [M69] Rosenblatt, 1958 Perceptron [R58] Fukushima, 1980 NeoCognitron (Convolutional NN) [F80] Hinton, 1983 Boltzmann machine [H83] Fukushima, 1975 Cognitron (Autoencoder) [F75] Hinton, 1986 RBM, Restricted Boltzmann machine [H86] Hinton, 2006 Deep Belief Networks [H06] (mid 1980s) Back-propagation Early Models Basic Models Break through Le, 2012 Training of 1 billion parameters [L12] Lee, 2009 Convolutional RBM [L09] LeCun, 1998 Revisit of CNN [L98] http://www.technologyrevi ew.com/featuredstory/5136 96/deep-learning/ 9/81
  • 10. Deep LearningTechniques Regularization helps the network avoid get over-fitted. dropout parameter sharing (CNN, RNN) early stopping weight decay sparse connectivity exploiting sparsity traditionaltrendy • Deconv nets (Zeiler et al., CVPR 2010) • Normalized initialization (Glorot et al., AISTATS 2010) • DropConnect (Wan et al., ICML 2013) • Batch normalization (Loffe et al., ICML 2015) • Inception (Szegedy et al., CVPR 2015) • Adversarial training (Goodfellow et al., ICLR 2015) LeCun et al., Proc. IEEE 1998Srivastava et al., JMLR 2014 Baidu 10/81
  • 11. Applications of Deep Learning Natural Language Understanding Natural Image Understanding from Karpathy et al., NIPS 2014. from Google I/O 2013 Highlights Speech Recognition Image Recognition Natural Language Processing output sentence current main applications rising applications 11/81
  • 12. • RBM is a type of logistic belief network whose structure is a bipartite graph. • Nodes: • Input layer: • Hidden layer: • Probability of a configuration : • • • Each node is a stochastic binary unit: • • can be used as a feature. Restricted Boltzmann Machines 12/81
  • 13. • CNN is a type of feed-forward artificial neural network where the individual neurons respond to overlapping regions in the visual field. • Key components are convolutional and subsampling layers. Convolutional Neural Networks LeCun et al., Proc. IEEE 1998. C-layer Convolution between a kernel and an image to extract features. S-layer Aggregation of the statistics of local features at various locations. 13/81
  • 14. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 14/81
  • 15. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 1 2 3 15/81
  • 16. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 1 2 3 16/81
  • 17. • As deep neural networks learn a large number of parameters, there have been many attempts to obtain reasonable solutions over a wide search space. In this dissertation, following three issues for deep learning are discussed. Dissertation Overview 17/81
  • 18. • As deep neural networks learn a large number of parameters, there have been many attempts to obtain reasonable solutions over a wide search space. In this dissertation, following three issues for deep learning are discussed. • First, deep neural networks expose the problem of intrinsic blind spots called adversarial perturbations. Dissertation Overview 18/81
  • 19. • As deep neural networks learn a large number of parameters, there have been many attempts to obtain reasonable solutions over a wide search space. In this dissertation, following three issues for deep learning are discussed. • First, deep neural networks expose the problem of intrinsic blind spots called adversarial perturbations. Dissertation Overview • Second, training restricted Boltzmann machines showed limited performance for sampling for minority samples in class-imbalanced dataset. 19/81
  • 20. • As deep neural networks learn a large number of parameters, there have been many attempts to obtain reasonable solutions over a wide search space. In this dissertation, following three issues for deep learning are discussed. • First, deep neural networks expose the problem of intrinsic blind spots called adversarial perturbations. Dissertation Overview • Second, training restricted Boltzmann machines showed limited performance for sampling for minority samples in class-imbalanced dataset. • Lastly, spatial dependency handling needs to be more complicated while convolutional neural networks are known as well learning technique for handling of spatial dependency. 20/81
  • 21. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 21/81
  • 22. • Desired behaviors and practical issues of deep learning and manifold learning: • Deep learning discriminates different classes; however, it may result in wiggly boundaries vulnerable to adversarial perturbations. • Manifold learning preserves geodesic distances; however, it may result in poor embedding. Motivation 22/81
  • 23. Szegedy et al, Intriguing Properties of Neural Networks, ICLR 2014. Goodfellow et al, Explaining and HarnessingAdversarial Examples, ICLR 2015. • We can generate an adversarial input 𝑥 𝑎𝑑𝑣 = 𝑥 + ∆𝑥. • We expect the classifier to assign the same class to 𝑥 and 𝑥 𝑎𝑑𝑣 so long as ∆𝑥 ∞ < 𝜖. • However, very small perturbation can misclassify correct images. Adversarial Example adversarial example original example small perturbation Goodfellow, ICLR 2015. fooling networks 23/81
  • 24. • Consider the dot product between a weight vector w and an adversarial example 𝑥 𝑎𝑑𝑣: • The adversarial perturbation causes the activation to grow by 𝑤 𝑇∆𝑥. • We can maximize this increase subject to max norm constraint on ∆𝑥 by assigning ∆𝑥 = sign(𝑤). HowCanWe Fool Neural Networks? 𝑤 𝑇 𝑥 𝑎𝑑𝑣 = 𝑤 𝑇 𝑥 + 𝑤 𝑇∆𝑥 𝑥 𝑎𝑑𝑣 = 𝑥 − 𝜀𝑤 if 𝑥 is positive 𝑥 𝑎𝑑𝑣 = 𝑥 + 𝜀𝑤 if 𝑥 is negative 𝑤 = [8.28, 10.03]𝑥 24/81
  • 25. Nguyen et al, Deep Neural Networks are Easily Fooled: HighConfidence Predictions for Unrecognizable Images, CVPR 2015. • We can maximize this increase subject to max norm constraint on ∆𝑥 by assigning ∆𝑥 = 𝜀(𝛻𝑥 𝐽(𝜃, 𝑥, 𝑦)). • We can also fool neural network by using following evolutionary algorithm. Deep Neural NetworksCan BeAlso Fooled 25/81
  • 26. • Adversarial examples can be explained as a property of high-dimensional dot products. • The direction of perturbation, rather than the specific point in space, matters most. Space is not full of pockets of adversarial examples that finely tile the reals like the rational numbers. • Because it is the direction that matters most, adversarial perturbations generalize across different clean examples. • Linear models lack the capacity to resist adversarial perturbation; only structures with a hidden layer (where the universal approximator theorem applies) should be trained to resist adversarial perturbation. Important Observations (Szegedy et al, ICLR 2014) 26/81
  • 27. • How can we cover adversarial examples? • Simply train all the noisy examples (Loosli et al., LargeScale Kernel Machines 2007: INFINITE MNIST dataset). • Exponential cost • Include the adversarial term in the objective function (Goodfellow et al., ICLR 2015). • 𝐽 𝜃, 𝑥, 𝑦 = 𝛼 𝐽 𝜃, 𝑥, 𝑦 + 1 − 𝛼 𝐽(𝜃, 𝑥 𝑎𝑑𝑣, 𝑦) • 1.14% -> 0.77% error rate on test 10000 examples • Commonly, people expect that elastic distortion can resist adversarial examples. RelatedWork 27/81
  • 28. What is Manifold In case of closed manifold, we may represent it in higher dimension more than original one. http://www.lib.utexas.edu/maps/world_maps/world_rel_803005AI_2003.jpg In real world, many of observations organize manifol d.That is reason why we are learning manifold.The picture are 2-d manifold and 3-d manifold. 28/81
  • 29. • Manifold term minimizes the difference between activations of several nodes of the same class samples. • This helps us to disentangle of the variation factors. Manifold RegularizationTerm 𝒂(1): input representation 𝒂(5): manifold representation 𝒂(6) : softmax layer 29/81
  • 30. Manifold RegularizationTerm • Manifold term minimizes the difference between activations of several nodes of the same class samples. • This helps us to disentangle of the variation factors. 𝒂(1): input representation 𝒂 𝒚 (1) 𝒂 𝒙 (1) 𝒂(5): manifold representation 𝒂 𝒚 (5) 𝒂 𝒙 (5) 30/81
  • 31. Manifold RegularizationTerm • Manifold term minimizes the difference between activations of several nodes of the same class samples. • This helps us to disentangle of the variation factors. 𝒂(1): input representation 𝒂(5): manifold representation 𝒂′ 𝒏 (5) 𝒂 𝒏 (5) 𝒙′ 𝒏 𝒙 𝒏 31/81
  • 32. Manifold RegularizationTerm • Manifold term minimizes the difference between activations of several nodes of the same class samples. • This helps us to disentangle of the variation factors. 𝒂(1): input representation 𝒂′ 𝒏 (5) 𝒂 𝒏 (5) 𝒂(5): manifold representation 𝒙′ 𝒏 𝒙 𝒏 +𝜷(𝜵 𝒙 𝒏 𝑳(𝜽; 𝒙 𝒏, 𝒚 𝒏)) 32/81
  • 33. • The proposed methodology learns both classifier and manifold embedding that is robust for adversarial perturbations. • Forward and backward operations of MRnet: • The first forward operation is the same as in a standard neural network. • The following backward 𝑎𝑑𝑣 is the same as the standard back-propagation except that an adversarial perturbation. Proposed Regularized Networks 33/81
  • 34. • Three datasets we tested: • (a) MNIST • (b, c)The rawdata and its normalized version (LCN) ofCIFAR-10 • (d, e)The rawdata and its normalized version (ZCA) of SVHN Experimental Results (Krizhevsky et al., 2009) (LeCun et al., 1998) (Netzer et al., 2011) 34/81
  • 35. • We chose 𝛽 in the range that did not violate class information. • (a-c) Distributions of Euclidean distances between training samples on individual datasets. • (d-f) Different perturbation levels on individual datasets. Generation ofAdversarial Examples 35/81
  • 36. MNIST Results Bar: statistics of 10 runs. Circle: single run reported in literatures. • Fully connected models have two hidden layers. • Convolutional models have more than two convolutional layers. • All the results are without data augmentation. • The proposed model shows the best performance among the alternatives. 36/81
  • 37. CIFAR-10 and SVHN Results 37/81
  • 38. • Data:CIFAR-10 test set. • (a) Pairwise distance matrix of a(L) without Φ. • (b) 2-D visualization of the manifold embedding through t-SNE without Φ. • (c)Query images and top 10 nearest images without Φ. • (d-f) Pairwise distance matrix, t-SNE plot, and query images with Φ. Embedding Results 38/81
  • 39. • We have proposed a novel methodology, unifying deep learning and manifold learning, called manifold regularized networks (MRnet). • We tested MRnet and confirmed its improved generalization performance underpinned by the proposed manifold loss term on deep architectures. • By exploiting the characteristics of blind spots, the proposed MRnet can be extended to the discovery of true representations on manifolds in various learning tasks. Summary ofTopic 1 39/81
  • 40. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 40/81
  • 41. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? • We propose a new RBM training method called boosted CD. • We also devise a regularization term for sparsity of DNA sequences. Motivation negative positive easy to misclassify query images 41/81
  • 42. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem exon GT: false boundary GT: true boundary ACGTCGACTGCTACGTAGCAGCGA TACGTACCGATCATCACTATCATC GAGGTACGATCGATCGATCGATCA GTCGATCGTCGTTCAGTCAGTCGA TATCAGTCATATGCACATCTCAGT DNA RNA protein gene expression GT (or AG) 16K 76M true sites exon intron 160K (=0.21% over 76M) 42/81
  • 43. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction We want to construct a learning model which can boost prediction performance in a complementary way to alignment-based method. 1 2 1 2 We propose a learning model based on (multilayer) RBMs and its training scheme. 43/81
  • 44. • Training weights to minimize negative log-likelihood of data. • Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps. • The CD-𝑘 updates after seeing example 𝒗: Contrastive Divergence (CD) forTraining RBMs approximated by k-step Markov chain 𝒗(0) = 𝒗 𝒉(0) 𝒉(1) 𝒉(𝑘) 𝒗(1) 𝒗(𝑘) 44/81
  • 45. • Boosting is a meta-algorithm which converts weak learners to strong ones. • Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. • The main variation between many boosting algorithms: • The method of weighting training data points and hypotheses. • AdaBoost, LPBoost,TotalBoost, … What Boosting Is from lecture notes @ UCIrvine CS 271 Fall 2007 45/81
  • 46. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) assign lower weights to ordinary samples assign higher weights to rare samples hardly observed regions 46/81
  • 47. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by PT Relative locations of samples and corresponding Markov chains by the proposed Relative locations of samples and corresponding Markov chains by CD hardly observed regions 47/81
  • 48. Relationship between Boosting and Importance Sampling Importance Sampling Boosted CD target distribution f proposal distribution g (a) (b) (c) (a) Samples cannot be drawn conveniently from 𝑓 (b)The importance sampler draws samples from 𝑔 (c) A sample of 𝑓 is obtained by multiplying 𝑓/𝑔 1. Samples are drawn from 𝑔. 2. A sample of 𝑓 is obtained by multiplying α. Correspondingly, 48/81
  • 49. • Balance equations: • a set of equations that can always be solved to give the equilibrium distribution of a Markov chain (when such a distribution exists). • For a restricted Boltzmann machine (Im et al., ICLR 2015): • For a restricted Boltzmann machine with boosted CD: • On the convergence properties of contrastive divergence (Sutskever et al., AISTATS 2010): • “TheCD update is not the gradient of any objective function.”; “The CD update is shown to have at least one fixed point when used with L2 regularization.” Balance Equations for Restricted Boltzmann Machine global balance (or full balance) local balance (or detailed balance) Boosted contrastive divergence inherited the properties of contrastive divergence. 49/81
  • 50. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient sparsity term reconstruction with and w/o the sparsity term derived from the sparsity term 50/81
  • 52. • For simulating a class- imbalance situation • we randomly dropped samples with different drop rates for different classes. Results: Effects of Boosting Description Training cost Noise handling Class-imbalance handling CD (Hinton, Neural Comp. 2002) Standard and widely used - - - Persistent CD (Tieleman, ICML 2008) Use of a single Markov chain - - Parallel tempering (Cho et al., IJCNN 2010) Simultaneous Markov chains generation Proposed boosted CD Reweighting samples - 52/81
  • 53. • Data preparation: • Real human DNA sequences with known boundary information. • GWH dataset: 2-class (boundary or not). • UCSC dataset: 3-class (acceptor, donor, or non-boundary). Experimental Setup for Junction Prediction Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor false acceptor 1false donor 1 53/81
  • 54. • The proposed method shows the best performance in terms of reconstruction error for both training and testing. • Compare to the softmax approach, the proposed regularized RBM succeeds in achieving lower error by slightly sacrificing the probability sum constraint. Results: Effects ofCategorical Gradient Data: chromosome 19 in GWH-donor Sequence Length: 200nt (800 dimension) # of iterations: 500 Learning rate: 0.1 L2-decay: 0.001 over-fitted best 54/81
  • 55. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime Insensitivity to sequence lengths Robustness to negative samples 55/81
  • 56. exon intron • (Important biological finding) non-canonical splicing can arise if: • Introns containGCA or NAA sequences at their boundaries. • Exons include contiguousA’s around the boundaries. Results: Identification of Non-Canonical Splice Sites We used 162,951 examples excluding canonical splice sites. 56/81
  • 57. Summary ofTopic 2 Significant boosts in splicing prediction performance Robustness to high-dimensional class-imbalanced data New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences The ability to detect subtle non-canonical splicing signals57/81
  • 58. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 58/81
  • 59. • In this paper, we consider the fused Lasso regression (FLR), an important special case of the ℓ1-penalized regression for structured sparsity: • The matrix 𝐷 is the difference matrix on the undirected and unweighted graph of adjacent variables. • Adjacency of the variables is determined by the application. • For graphs with 2-D grid , the objective function can be written as • The second penalty function is non-smooth and non-separable. Fused Lasso Regression 59/81
  • 60. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU. Overview of Proposed Method fused Lasso 60/81
  • 61. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU. Overview of Proposed Method approximating due to the ℓ1-norm fused Lasso fused Lasso + split Bregman algorithm 61/81
  • 62. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU. Overview of Proposed Method approximating due to the ℓ1-norm fused Lasso fused Lasso + split Bregman algorithm accelerating for solving a linear system fused Lasso + split Bregman algorithm + PCGLS 62/81
  • 63. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU. Overview of Proposed Method approximating due to the ℓ1-norm fused Lasso fused Lasso + split Bregman algorithm accelerating for solving a linear system fused Lasso + split Bregman algorithm + PCGLS replacing a linear system solver with FFT fused Lasso + split Bregman algorithm + PCGLS + FFT 63/81
  • 64. • Split Bregman algorithm for the ℓ1-norm: • Because of the ℓ1-norm, the objective function is non-differentiable. Split BregmanAlgorithm for Fused Lasso introducing an auxiliary variable approximating 64/81
  • 65. • The conjugate gradient (CG) method aims to solve the linear system of equations for the form 𝐴𝑥 = 𝑏 iteratively when 𝐴 is symmetric and positive definite. PCGLSAlgorithm • For the least squared problems, it is well-known that (9) is equivalent to solving the normal equation 𝑥 = (𝐴 𝑇 𝐴)−1 𝐴 𝑇 𝑏. • TheCG algorithm for least squares is often referred to as theCGLS, and its preconditioned counterpart as the PCGLS (in this case the scaling amounts to 𝐴 𝑇 𝐴 -> 𝑀−𝑇 𝐴 𝑇 𝐴𝑀−1). acceleratable 65/81
  • 66. • In mathematics, Poisson's equation is a partial differential equation of elliptic type with broad utility in electrostatics, mechanical engineering and theoretical physics. • Poisson’s equation is frequently written as Poisson’s Equation http://en.wikipedia.org/wiki/Poisson's_equation http://people.rit.edu/~pnveme/ExplictSolutions2/2Dim/Linear/PoissonDisk/PoissonDisk.html 66/81
  • 67. • In two-dimensional Cartesian coordinates, it takes the form Poisson’s Equation in 2-Dimensions block tri-diagonal system 67/81
  • 68. • Mathematical background • Apply 2D forward FFT to 𝑓 to obtain 𝑓(𝑘), where 𝑘 is the wave number • Apply the inverse of the Laplace operator to 𝑓(𝑘) to obtain 𝑣(𝑘): simple element-wise division in Fourier space • Apply 2D inverse FFT to 𝑣(𝑘) to obtain 𝑣 Poisson’s Equation using the FFT 𝑣 = − 𝑓 (𝑘 𝑥 2 + 𝑘 𝑦 2 ) 𝛻2 𝑣 = 𝑓 ↔ −(𝑘 𝑥 2 + 𝑘 𝑦 2 )𝑣 = 𝑓 http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/3-CUDA_libraries_+_Matlab.pdf 68/81
  • 69. • Pseudo codes for two iterative methods: Split BregmanAlgorithm for Fused Lasso (1/2) FFT 69/81
  • 70. • Multi-GPU operations for matrix-vector computations Split BregmanAlgorithm for Fused Lasso (2/2) 70/81
  • 71. • The computation times are measured inCPU time with • CPU: Intel Xeon E5-4620 (2.2GHz) and 16GB RAM • GPU: NVIDIAGTXTitan (2688 cores, 6GBGDDR5) • We set the regularization parameters 𝜆1, 𝜆2 = 1,1 and stopping criterion is • We generate 𝑛 samples from a 𝑝-dimensional 𝑁(0, 𝐼 𝑝) and the response variable y is generated by using 𝑦 = 𝑋𝛽 + 𝜖 (𝑁(0, 𝐼 𝑛)) where 𝛽 = . Experiments 71/81
  • 72. • We first considered scenarios with synthetic regression problems where the coefficients were defined on a square grid: • For the very large cases, the average speed-up: 409.19 to 433.23 Runtime Comparison for PiecewiseConstant BlocksCases 72/81
  • 73. • For the other cases (n = 12000–24000), the average speed-up: 26.67–47.47 • CircularGaussian cases are formulated by: Runtime Comparison forCircularGaussian Cases 73/81
  • 74. • Image-based regression of the behavioral fMRI data. • Regression coefficients were overlaid and color-coded on the brain map as described in the text. Structured Sparsity Regression Example 74/81
  • 75. • Image-based regression of the behavioral fMRI data. • Regression coefficients were overlaid and color-coded on the brain map as described in the text. Structured Sparsity Regression Example 75/81
  • 76. • By applying the proposed method to various large-scale datasets extensively, we have demonstrated successfully the following: • Feasibility of highly-parallelizable computational algorithms for high- dimensional structured sparse regression problems, • Use case of direct-communicating multiple GPUs for speed-up and scalability, • Promise of FFT-based preconditioners for parallel solving of a family of linear systems. • That the highest (433x) speed up occurred at the highest dimensional problems clearly indicates where the merit of the multi-GPU scheme lies. • Future work: connecting dots to deep neural networks • FusedAutoencoder, Multi-layer fused Lasso, … Summary ofTopic 3 76/81
  • 77. • Achievements • Preliminary • Deep neural networks • Dissertation overview • Adversarial example handling • Manifold regularized deep neural networks using adversarial examples • Class-imbalance handling • Boosted contrastive divergence • Spatial dependency handling • Structured sparsity via parallel fused Lasso • Conclusion • Limitations and future work Outline 77/81
  • 78. 1. The MRnet can be applied in a complementary way to generalize neural networks with traditional techniques such as L2 decay. 2. We propose a novel method for training RBMs for class-imbalanced prediction. Our proposal includes a deep belief network-based methodology for computational splice junction prediction. 3. The parallel fused Lasso can be applied for data that have structured sparsity like images to exploit more prior knowledge than convolutional or recurrent operations. Conclusion This dissertation proposed a set of robust feature learning schemes that can learn meaningful representation underlying in large-scale genomic datasets and image datasets using deep networks. 1 2 3 78/81
  • 79. • Several future work for the proposed methodologies can be possible. • First, we can extend MRnet to extract scaling and translation invariant features by replacing synthetic of nearest training samples. • Second, it can be also interesting to alternate the objective function of MRnet in order to generalize the whole procedure of MRnet. • Lastly, the proposed three schemes (manifold loss, boosting, and L1 fusion penalty) can be applied into the framework of recurrent neural networks. Limitations and FutureWork We need to make the proposed schemes more universal and general. 79/81