Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
Learn the fundamentals of Deep Learning, Machine Learning, and AI, how they've impacted everyday technology, and what's coming next in Artificial Intelligence technology.
This is a presentation on Handwritten Digit Recognition using Convolutional Neural Networks. Convolutional Neural Networks give better results as compared to conventional Artificial Neural Networks.
An overview of Deep Learning With Neural Networks. Use cases of Deep learning and it's development. Basic introduction tp the layers of Neural Networks.
"Mainstream access to deep learning technology will greatly impact most industries over the next three to five years."
So what exactly is deep learning? How does it work? And most importantly, why should you even care?
Deep learning is used in the research community and in industry to help solve many big data problems such as computer vision, speech recognition, and natural language processing.
Practical examples include:
-Vehicle, pedestrian and landmark identification for driver assistance
-Image recognition
-Speech recognition and translation
-Natural language processing
-Life sciences
-What You Will Learn
-Understand the intuition behind Artificial Neural Networks
-Apply Artificial Neural Networks in practice
-Understand the intuition behind Convolutional Neural Networks
-Apply Convolutional Neural Networks in practice
-Understand the intuition behind Recurrent Neural Networks
-Apply Recurrent Neural Networks in practice
-Understand the intuition behind Self-Organizing Maps
-Apply Self-Organizing Maps in practice
-Understand the intuition behind Boltzmann Machines
-Apply Boltzmann Machines in practice
-Understand the intuition behind AutoEncoders
-Apply AutoEncoders in practice
Learn the fundamentals of Deep Learning, Machine Learning, and AI, how they've impacted everyday technology, and what's coming next in Artificial Intelligence technology.
This is a presentation on Handwritten Digit Recognition using Convolutional Neural Networks. Convolutional Neural Networks give better results as compared to conventional Artificial Neural Networks.
An overview of Deep Learning With Neural Networks. Use cases of Deep learning and it's development. Basic introduction tp the layers of Neural Networks.
"Mainstream access to deep learning technology will greatly impact most industries over the next three to five years."
So what exactly is deep learning? How does it work? And most importantly, why should you even care?
Deep learning is used in the research community and in industry to help solve many big data problems such as computer vision, speech recognition, and natural language processing.
Practical examples include:
-Vehicle, pedestrian and landmark identification for driver assistance
-Image recognition
-Speech recognition and translation
-Natural language processing
-Life sciences
-What You Will Learn
-Understand the intuition behind Artificial Neural Networks
-Apply Artificial Neural Networks in practice
-Understand the intuition behind Convolutional Neural Networks
-Apply Convolutional Neural Networks in practice
-Understand the intuition behind Recurrent Neural Networks
-Apply Recurrent Neural Networks in practice
-Understand the intuition behind Self-Organizing Maps
-Apply Self-Organizing Maps in practice
-Understand the intuition behind Boltzmann Machines
-Apply Boltzmann Machines in practice
-Understand the intuition behind AutoEncoders
-Apply AutoEncoders in practice
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the “Introduction to Data Augmentation Techniques in ML Frameworks” tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
A presentation introducting DeepLab V3+, the state-of-the-art architecture for semantic segmentation. It also includes detailed descriptions of how 2D multi-channel convolutions function, as well as giving a detailed explanation of depth-wise separable convolutions.
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
Using CNN with Keras and Tensorflow, we have a deployed a solution which can train any image on the fly. Code uses Google Api to fetch new images, VGG16 model to train the model and is deployed using Python Django framework
This talk will cover various medical applications of deep learning including tumor segmentation in histology slides, MRI, CT, and X-Ray data. Also, more complicated tasks such as cell counting where the challenge is to count how many objects are in an image. It will also cover generative adversarial networks and how they can be used for medical applications. This presentation is accessible to non-doctors and non-computer scientists.
Deep Learning - Overview of my work IIMohamed Loey
Deep Learning Machine Learning MNIST CIFAR 10 Residual Network AlexNet VGGNet GoogleNet Nvidia Deep learning (DL) is a hierarchical structure network which through simulates the human brain’s structure to extract the internal and external input data’s features
The next phase of Smart Network Convergence could be putting Deep Learning systems on the Internet. Deep Learning and Blockchain Technology might be combined in the smart networks of the future for automated identification (deep learning) and automated transaction (blockchain). Large scale future-class problems might be addressed with Blockchain Deep Learning nets as an advanced computational infrastructure, challenges such as million-member genome banks, energy storage markets, global financial risk assessment, real-time voting, and asteroid mining.
Blockchain Deep Learning nets and Smart Networks more generally are computing networks with intelligence built in such that identification and transfer is performed by the network itself through sophisticated protocols that automatically identify (deep learning), and validate, confirm, and route transactions (blockchain) within the network.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Invited talk at Tsinghua University on "Applications of Deep Neural Network". As the tech. lead of deep learning task force at NIO USA INC, I was invited to give this colloquium talk on general applications of deep neural network.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/an-introduction-to-data-augmentation-techniques-in-ml-frameworks-a-presentation-from-amd/
Rajy Rawther, PMTS Software Architect at AMD, presents the “Introduction to Data Augmentation Techniques in ML Frameworks” tutorial at the May 2021 Embedded Vision Summit.
Data augmentation is a set of techniques that expand the diversity of data available for training machine learning models by generating new data from existing data. This talk introduces different types of data augmentation techniques as well as their uses in various training scenarios.
Rawther explores some built-in augmentation methods in popular ML frameworks like PyTorch and TensorFlow. She also discusses some tips and tricks that are commonly used to randomly select parameters to avoid having model overfit to a particular dataset.
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
A presentation introducting DeepLab V3+, the state-of-the-art architecture for semantic segmentation. It also includes detailed descriptions of how 2D multi-channel convolutions function, as well as giving a detailed explanation of depth-wise separable convolutions.
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
Using CNN with Keras and Tensorflow, we have a deployed a solution which can train any image on the fly. Code uses Google Api to fetch new images, VGG16 model to train the model and is deployed using Python Django framework
This talk will cover various medical applications of deep learning including tumor segmentation in histology slides, MRI, CT, and X-Ray data. Also, more complicated tasks such as cell counting where the challenge is to count how many objects are in an image. It will also cover generative adversarial networks and how they can be used for medical applications. This presentation is accessible to non-doctors and non-computer scientists.
Deep Learning - Overview of my work IIMohamed Loey
Deep Learning Machine Learning MNIST CIFAR 10 Residual Network AlexNet VGGNet GoogleNet Nvidia Deep learning (DL) is a hierarchical structure network which through simulates the human brain’s structure to extract the internal and external input data’s features
The next phase of Smart Network Convergence could be putting Deep Learning systems on the Internet. Deep Learning and Blockchain Technology might be combined in the smart networks of the future for automated identification (deep learning) and automated transaction (blockchain). Large scale future-class problems might be addressed with Blockchain Deep Learning nets as an advanced computational infrastructure, challenges such as million-member genome banks, energy storage markets, global financial risk assessment, real-time voting, and asteroid mining.
Blockchain Deep Learning nets and Smart Networks more generally are computing networks with intelligence built in such that identification and transfer is performed by the network itself through sophisticated protocols that automatically identify (deep learning), and validate, confirm, and route transactions (blockchain) within the network.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Invited talk at Tsinghua University on "Applications of Deep Neural Network". As the tech. lead of deep learning task force at NIO USA INC, I was invited to give this colloquium talk on general applications of deep neural network.
Deep learning is now making the Artificial Intelligence near to Human. Machine Learning and Deep Artificial Neural Network make the copy of Human Brain. The success is due to large storage, computation with efficient algorithms to handle more behavioral and cognitive problem
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...cscpconf
This paper presents an ensemble of neo-cognitron neural network base classifiers to enhance
the accuracy of the system, along the experimental results. The method offers lesser
computational preprocessing in comparison to other ensemble techniques as it ex-preempts
feature extraction process before feeding the data into base classifiers. This is achieved by the
basic nature of neo-cognitron, it is a multilayer feed-forward neural network. Ensemble of such
base classifiers gives class labels for each pattern that in turn is combined to give the final class
label for that pattern. The purpose of this paper is not only to exemplify learning behaviour of
neo-cognitron as base classifiers, but also to purport better fashion to combine neural network
based ensemble classifiers.
Recurrent Neural Networks (RNNs) represent the reference class of Deep Learning models for learning from sequential data. Despite the widespread success, a major downside of RNNs and commonly derived ‘gating’ variants (LSTM, GRU) is given by the high cost of the involved training algorithms. In this context, an increasingly popular alternative is the Reservoir Computing (RC) approach, which enables limiting the training algorithm to operate only on a restricted set of (output) parameters. RC is appealing for several reasons, including the amenability of being implemented in low-powerful edge devices, enabling adaptation and personalization in IoT and cyber-physical systems applications.
This webinar will introduce Reservoir Computing from scratch, covering all the fundamental design topics as well as good practices. It is targeted to both researchers and practitioners that are interested in setting up fastly-trained Deep Learning models for sequential data.
Deep learning lecture - part 1 (basics, CNN)SungminYou
This presentation is a lecture with the Deep Learning book. (Bengio, Yoshua, Ian Goodfellow, and Aaron Courville. MIT press, 2017) It contains the basics of deep learning and theories about the convolutional neural network.
Artificial Neural Network and its Applicationsshritosh kumar
Abstract
This report is an introduction to Artificial Neural
Networks. The various types of neural networks are
explained and demonstrated, applications of neural
networks like ANNs in medicine are described, and a
detailed historical background is provided. The
connection between the artificial and the real thing is
also investigated and explained. Finally, the
mathematical models involved are presented and
demonstrated.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
2/81
3. ResearchAreas
Deep neural networks are able to learn hierarchical representations.
Theory Image
Time series Bioinformatics
Machine Learning
Deep Learning
• Main theories: machine learning, deep learning, statistical learning
• Main applications: computer vision, bioinformatics
• Main skills: parallel computing
3/81
4. • Byunghan Lee, Taehoon Lee, andSungroh Yoon,"DNA-Level Splice Junction Prediction using Deep Recurrent Neural Networks," in Proceedings
of NIPS Workshop on Machine Learning in Computational Biology, Montreal, Canada, December 2015.
• Seungmyung Lee, Hanjoo Kim, Siqi Tan, Taehoon Lee, Sungroh Yoon, and Rhiju Das, "Automated band annotation for RNA structure probing
experiments with numerous capillary electrophoresis profiles," Bioinformatics, vol. 31, no. 17, pp. 2808-2815, September 2015.
• Taehoon Lee and Sungroh Yoon, "Boosted Categorical Restricted Boltzmann Machine for Computational Prediction of Splice Junctions," in
Proceedings of International Conference on Machine Learning (ICML), Lille, France, July 2015.
• Donghyeon Yu, Joong-Ho Won, Taehoon Lee, Johan Lim, and Sungroh Yoon,"High-dimensional Fused Lasso Regression using Majorization-
Minimization and Parallel Processing," Journal of Computational and Graphical Statistics, vol.24, no.1, pp. 121-153, March 2015.
• Taehoon Lee, Sungmin Lee, Woo Young Sim, Yu MiJung, Sunmi Han, Chanil Chung, Jay Junkeun Chang, Hyeyoung Min,and Sungroh Yoon,
"Robust Classification of DNA Damage Patterns in Single Cell Gel Electrophoresis," in Proceedings of 35th Annual International Conference of the
IEEE Engineering in Medicine andBiology Society (EMBC),Osaka, Japan, July 2013.
• Taehoon Lee, Hyeyoung Min,Seung Jean Kim, and Sungroh Yoon, "Application of maximin correlation analysis to classifying protein
environments for function prediction," Biochemical and Biophysical Research Communications, vol. 400, no. 2, pp. 219-224, September 2010.
• Hyeyoung Min,Seunghak Yu, Taehoon Lee, and Sungroh Yoon, "Support vector machine based classification of 3-dimensional protein
physicochemical environments for automated function annotation," Archives of Pharmacal Research, vol. 33, no. 9, pp. 1451-1459,September
2010.
• Taehoon Lee, Seung Jean Kim, Eui-Young Chung, andSungroh Yoon, "K-maximin Clustering: A Maximin Correlation Approach to Partition-Based
Clustering, " IEICE Electronics Express, vol. 6, no. 17, pp. 1205-1211, September 2009.
• Taehoon Lee, Taesup Moon,Seung Jean Kim, and Sungroh Yoon,"Regularization and Kernelization of the Maximin Correlation Approach" (under
review)
• Taehoon Lee, Minsuk Choi, and Sungroh Yoon, "Manifold Regularized Deep Networks using Adversarial Examples" (under review)
• Taehoon Lee, Joong-Ho Won, Johan Lim, and Sungroh Yoon,"Large-scale Fused Lasso on multi-GPU using FFT-Based Split Bregman Method"
(under review)
• Taehoon Lee et al., "HiComet: High-Throughput Comet Analysis Tool for Large-Scale DNA Damage Assessment Studies" (in preparation)
Publications
• 게재 완료: SCI급 저널 5편, 학술대회 논문 3편 (제 1저자 총 4편)
• 심사 중: SCI급 저널 3편, 학술대회 논문 1편 (모두 제 1저자)
• 국내 저널 및 학회: 12편 (제 1저자 6편)
4/81
6. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
6/81
7. • Deep Neural Network (DNN) learns effective hierarchical representation.
• DNN learns automatically representations and features from data.
What Do Deep Neural Networks Learn
object
↑
part
↑
motif
↑
Edge
Image
story
↑
sentence
↑
clause
↑
word
Language
word
↑
phoneme
↑
phone
↑
Sound
Speech
output
input
Hand-crafted program
Hand-crafted features
Trainable features
Trainable classifier
Trainable classifier
tiger
Traditional machine learning
Deep learning
Rule-based systems
higher level of
abstraction
7/81
8. 3 × 2 + 3 × 5 + 3 × 7 → 3 × (2 + 5 + 7)
• As the number of layers goes larger, the effect of factorization gets higher.
• Factorization is the decomposition of an object into a product of factors.
Why Do Deep Neural NetworksWork SoWell
𝑥 𝑦
𝑊(1)
𝑊(2)
𝑥 𝑦
𝑊(1)
𝑊(2)
𝑊(3)
𝑊(4)
The more number of paths with the same number of weight values
shallow deep
Many data, complex models, various priors, and high-end
hardware altogether are enabling deep learning prosper.
8/81
9. History ofArtificial Neural Networks
Minsky and Papert, 1969
“Perceptrons”
(Limits of Perceptrons) [M69]
Rosenblatt, 1958
Perceptron [R58]
Fukushima, 1980
NeoCognitron
(Convolutional NN) [F80]
Hinton, 1983
Boltzmann
machine [H83]
Fukushima, 1975
Cognitron (Autoencoder) [F75]
Hinton, 1986
RBM, Restricted
Boltzmann machine [H86]
Hinton, 2006
Deep Belief
Networks [H06]
(mid 1980s)
Back-propagation
Early Models
Basic Models
Break
through
Le, 2012
Training of 1 billion
parameters [L12]
Lee, 2009
Convolutional
RBM [L09]
LeCun, 1998
Revisit of CNN [L98]
http://www.technologyrevi
ew.com/featuredstory/5136
96/deep-learning/
9/81
10. Deep LearningTechniques
Regularization helps the network avoid get over-fitted.
dropout
parameter
sharing
(CNN, RNN)
early stopping
weight decay
sparse
connectivity
exploiting
sparsity
traditionaltrendy
• Deconv nets
(Zeiler et al., CVPR 2010)
• Normalized initialization
(Glorot et al., AISTATS 2010)
• DropConnect
(Wan et al., ICML 2013)
• Batch normalization
(Loffe et al., ICML 2015)
• Inception
(Szegedy et al., CVPR 2015)
• Adversarial training
(Goodfellow et al., ICLR 2015)
LeCun et al., Proc.
IEEE 1998Srivastava et al.,
JMLR 2014
Baidu
10/81
11. Applications of Deep Learning
Natural Language Understanding
Natural Image Understanding
from Karpathy et al., NIPS 2014.
from Google I/O 2013 Highlights
Speech
Recognition
Image Recognition
Natural
Language
Processing
output sentence
current main applications rising applications
11/81
12. • RBM is a type of logistic belief network whose structure is a bipartite graph.
• Nodes:
• Input layer:
• Hidden layer:
• Probability of a configuration :
•
•
• Each node is a stochastic binary unit:
•
• can be used as a feature.
Restricted Boltzmann Machines
12/81
13. • CNN is a type of feed-forward artificial neural network where the individual
neurons respond to overlapping regions in the visual field.
• Key components are convolutional and subsampling layers.
Convolutional Neural Networks
LeCun et al., Proc. IEEE 1998.
C-layer
Convolution
between a kernel
and an image to
extract features.
S-layer
Aggregation of
the statistics of
local features at
various locations.
13/81
14. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
14/81
15. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
1
2
3
15/81
16. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
1
2
3
16/81
17. • As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
Dissertation Overview
17/81
18. • As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
18/81
19. • As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
• Second, training restricted Boltzmann machines showed
limited performance for sampling for minority samples in
class-imbalanced dataset.
19/81
20. • As deep neural networks learn a large number of parameters, there have been
many attempts to obtain reasonable solutions over a wide search space. In this
dissertation, following three issues for deep learning are discussed.
• First, deep neural networks expose the problem of intrinsic blind spots called
adversarial perturbations.
Dissertation Overview
• Second, training restricted Boltzmann machines showed
limited performance for sampling for minority samples in
class-imbalanced dataset.
• Lastly, spatial dependency handling needs to be more
complicated while convolutional neural networks are known
as well learning technique for handling of spatial dependency.
20/81
21. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
21/81
22. • Desired behaviors and practical issues of deep learning and manifold learning:
• Deep learning discriminates different classes; however, it may result in
wiggly boundaries vulnerable to adversarial perturbations.
• Manifold learning preserves geodesic distances; however, it may result in
poor embedding.
Motivation
22/81
23. Szegedy et al, Intriguing Properties of Neural Networks, ICLR 2014.
Goodfellow et al, Explaining and HarnessingAdversarial Examples, ICLR 2015.
• We can generate an adversarial input 𝑥 𝑎𝑑𝑣 = 𝑥 + ∆𝑥.
• We expect the classifier to assign the same class to 𝑥 and 𝑥 𝑎𝑑𝑣 so long as
∆𝑥 ∞ < 𝜖.
• However, very small perturbation can misclassify correct images.
Adversarial Example
adversarial
example
original
example
small
perturbation
Goodfellow, ICLR 2015.
fooling networks
23/81
24. • Consider the dot product between a weight vector w and an adversarial
example 𝑥 𝑎𝑑𝑣:
• The adversarial perturbation causes the activation to grow by 𝑤 𝑇∆𝑥.
• We can maximize this increase subject to max norm constraint on ∆𝑥 by
assigning ∆𝑥 = sign(𝑤).
HowCanWe Fool Neural Networks?
𝑤 𝑇 𝑥 𝑎𝑑𝑣 = 𝑤 𝑇 𝑥 + 𝑤 𝑇∆𝑥
𝑥 𝑎𝑑𝑣 = 𝑥 − 𝜀𝑤 if 𝑥 is positive
𝑥 𝑎𝑑𝑣 = 𝑥 + 𝜀𝑤 if 𝑥 is negative
𝑤 = [8.28, 10.03]𝑥
24/81
25. Nguyen et al, Deep Neural Networks are Easily Fooled: HighConfidence
Predictions for Unrecognizable Images, CVPR 2015.
• We can maximize this increase subject to max norm constraint on ∆𝑥 by
assigning ∆𝑥 = 𝜀(𝛻𝑥 𝐽(𝜃, 𝑥, 𝑦)).
• We can also fool neural network by using following evolutionary algorithm.
Deep Neural NetworksCan BeAlso Fooled
25/81
26. • Adversarial examples can be explained as a property of high-dimensional
dot products.
• The direction of perturbation, rather than the specific point in space, matters
most. Space is not full of pockets of adversarial examples that finely tile the
reals like the rational numbers.
• Because it is the direction that matters most, adversarial perturbations
generalize across different clean examples.
• Linear models lack the capacity to resist adversarial perturbation; only
structures with a hidden layer (where the universal approximator theorem
applies) should be trained to resist adversarial perturbation.
Important Observations (Szegedy et al, ICLR 2014)
26/81
27. • How can we cover adversarial examples?
• Simply train all the noisy examples (Loosli et al., LargeScale Kernel
Machines 2007: INFINITE MNIST dataset).
• Exponential cost
• Include the adversarial term in the objective function (Goodfellow et al.,
ICLR 2015).
• 𝐽 𝜃, 𝑥, 𝑦 = 𝛼 𝐽 𝜃, 𝑥, 𝑦 + 1 − 𝛼 𝐽(𝜃, 𝑥 𝑎𝑑𝑣, 𝑦)
• 1.14% -> 0.77% error rate on test 10000 examples
• Commonly, people expect that elastic distortion can resist adversarial
examples.
RelatedWork
27/81
28. What is Manifold
In case of closed manifold,
we may represent it
in higher dimension
more than original one.
http://www.lib.utexas.edu/maps/world_maps/world_rel_803005AI_2003.jpg
In real world, many of observations organize manifol
d.That is reason why we are learning manifold.The
picture are 2-d manifold and 3-d manifold.
28/81
29. • Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
Manifold RegularizationTerm
𝒂(1): input representation
𝒂(5): manifold representation
𝒂(6)
: softmax layer
29/81
30. Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂 𝒚
(1)
𝒂 𝒙
(1)
𝒂(5): manifold representation
𝒂 𝒚
(5)
𝒂 𝒙
(5)
30/81
31. Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂(5): manifold representation
𝒂′ 𝒏
(5)
𝒂 𝒏
(5)
𝒙′ 𝒏
𝒙 𝒏
31/81
32. Manifold RegularizationTerm
• Manifold term minimizes the difference between activations of several nodes
of the same class samples.
• This helps us to disentangle of the variation factors.
𝒂(1): input representation
𝒂′ 𝒏
(5)
𝒂 𝒏
(5)
𝒂(5): manifold representation
𝒙′ 𝒏
𝒙 𝒏
+𝜷(𝜵 𝒙 𝒏
𝑳(𝜽; 𝒙 𝒏, 𝒚 𝒏))
32/81
33. • The proposed methodology learns both classifier and manifold embedding
that is robust for adversarial perturbations.
• Forward and backward operations of MRnet:
• The first forward operation is the same as in a standard neural network.
• The following backward 𝑎𝑑𝑣 is the same as the standard back-propagation
except that an adversarial perturbation.
Proposed Regularized Networks
33/81
34. • Three datasets we tested:
• (a) MNIST
• (b, c)The rawdata and its normalized version (LCN) ofCIFAR-10
• (d, e)The rawdata and its normalized version (ZCA) of SVHN
Experimental Results
(Krizhevsky et al., 2009)
(LeCun et al., 1998)
(Netzer et al., 2011)
34/81
35. • We chose 𝛽 in the range that did not violate class information.
• (a-c) Distributions of Euclidean distances between training samples on
individual datasets.
• (d-f) Different perturbation levels on individual datasets.
Generation ofAdversarial Examples
35/81
36. MNIST Results
Bar: statistics of 10 runs.
Circle: single run reported
in literatures.
• Fully connected models have two hidden layers.
• Convolutional models have more than two
convolutional layers.
• All the results are without data augmentation.
• The proposed model shows the best
performance among the alternatives.
36/81
38. • Data:CIFAR-10 test set.
• (a) Pairwise distance matrix of a(L) without Φ.
• (b) 2-D visualization of the manifold embedding through t-SNE without Φ.
• (c)Query images and top 10 nearest images without Φ.
• (d-f) Pairwise distance matrix, t-SNE plot, and query images with Φ.
Embedding Results
38/81
39. • We have proposed a novel methodology, unifying deep learning and manifold
learning, called manifold regularized networks (MRnet).
• We tested MRnet and confirmed its improved generalization performance
underpinned by the proposed manifold loss term on deep architectures.
• By exploiting the characteristics of blind spots, the proposed MRnet can be
extended to the discovery of true representations on manifolds in various
learning tasks.
Summary ofTopic 1
39/81
40. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
40/81
41. • Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
• We also devise a regularization term for sparsity of DNA sequences.
Motivation
negative positive
easy to
misclassify
query images
41/81
42. • Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
GT (or AG)
16K
76M
true sites
exon
intron
160K
(=0.21% over 76M)
42/81
43. • Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
We propose a learning model based on (multilayer) RBMs
and its training scheme.
43/81
44. • Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
approximated by
k-step Markov chain
𝒗(0) = 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1) 𝒗(𝑘)
44/81
45. • Boosting is a meta-algorithm which converts weak learners to strong ones.
• Most boosting algorithms consist of iteratively learning weak classifiers with
respect to a distribution and adding them to a final strong classifier.
• The main variation between many boosting algorithms:
• The method of weighting training data points and hypotheses.
• AdaBoost, LPBoost,TotalBoost, …
What Boosting Is
from lecture notes @ UCIrvine CS 271 Fall 2007
45/81
46. • Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign lower
weights to
ordinary samples
assign higher
weights to
rare samples
hardly
observed
regions
46/81
47. • If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by PT
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
47/81
48. Relationship between Boosting and Importance Sampling
Importance Sampling Boosted CD
target distribution f
proposal distribution g
(a)
(b)
(c)
(a) Samples cannot be drawn conveniently from 𝑓
(b)The importance sampler draws samples from 𝑔
(c) A sample of 𝑓 is obtained by multiplying 𝑓/𝑔
1. Samples are drawn from 𝑔.
2. A sample of 𝑓 is obtained by multiplying α.
Correspondingly,
48/81
49. • Balance equations:
• a set of equations that can always be solved to give the equilibrium
distribution of a Markov chain (when such a distribution exists).
• For a restricted Boltzmann machine (Im et al., ICLR 2015):
• For a restricted Boltzmann machine with boosted CD:
• On the convergence properties of contrastive divergence (Sutskever et al., AISTATS 2010):
• “TheCD update is not the gradient of any objective function.”; “The CD update
is shown to have at least one fixed point when used with L2 regularization.”
Balance Equations for Restricted Boltzmann Machine
global balance
(or full balance)
local balance
(or detailed balance)
Boosted contrastive divergence inherited the
properties of contrastive divergence.
49/81
50. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
reconstruction with and w/o
the sparsity term
derived from
the sparsity term
50/81
52. • For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
Proposed boosted CD Reweighting samples - 52/81
53. • Data preparation:
• Real human DNA sequences with known boundary information.
• GWH dataset: 2-class (boundary or not).
• UCSC dataset: 3-class (acceptor, donor, or non-boundary).
Experimental Setup for Junction Prediction
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
53/81
54. • The proposed method shows the best performance in terms of reconstruction
error for both training and testing.
• Compare to the softmax approach, the proposed regularized RBM succeeds in
achieving lower error by slightly sacrificing the probability sum constraint.
Results: Effects ofCategorical Gradient
Data: chromosome 19 in
GWH-donor
Sequence Length: 200nt
(800 dimension)
# of iterations: 500
Learning rate: 0.1
L2-decay: 0.001
over-fitted best
54/81
55. Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths Robustness to negative samples
55/81
56. exon intron
• (Important biological finding) non-canonical splicing can arise if:
• Introns containGCA or NAA sequences at their boundaries.
• Exons include contiguousA’s around the boundaries.
Results: Identification of Non-Canonical Splice Sites
We used 162,951
examples excluding
canonical splice sites.
56/81
57. Summary ofTopic 2
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
The ability to detect subtle
non-canonical splicing signals57/81
58. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
58/81
59. • In this paper, we consider the fused Lasso regression (FLR), an important
special case of the ℓ1-penalized regression for structured sparsity:
• The matrix 𝐷 is the difference matrix on the undirected and unweighted
graph of adjacent variables.
• Adjacency of the variables is determined by the application.
• For graphs with 2-D grid , the objective function can be written as
• The second penalty function is non-smooth and non-separable.
Fused Lasso Regression
59/81
60. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
fused Lasso
60/81
61. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
61/81
62. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
accelerating for solving a linear system
fused Lasso + split Bregman algorithm + PCGLS
62/81
63. • We want to solve the 2-dimensional fused Lasso regression on multi-GPU.
Overview of Proposed Method
approximating due to the ℓ1-norm
fused Lasso
fused Lasso + split Bregman algorithm
accelerating for solving a linear system
fused Lasso + split Bregman algorithm + PCGLS
replacing a linear system solver with FFT
fused Lasso + split Bregman algorithm + PCGLS + FFT
63/81
64. • Split Bregman algorithm for the ℓ1-norm:
• Because of the ℓ1-norm, the objective function is non-differentiable.
Split BregmanAlgorithm for Fused Lasso
introducing an auxiliary variable
approximating
64/81
65. • The conjugate gradient (CG) method aims to solve the linear system of
equations for the form 𝐴𝑥 = 𝑏 iteratively when 𝐴 is symmetric and positive
definite.
PCGLSAlgorithm
• For the least squared problems, it is
well-known that (9) is equivalent to
solving the normal equation
𝑥 = (𝐴 𝑇 𝐴)−1 𝐴 𝑇 𝑏.
• TheCG algorithm for least squares is
often referred to as theCGLS, and its
preconditioned counterpart as the
PCGLS (in this case the scaling
amounts to 𝐴 𝑇 𝐴 -> 𝑀−𝑇 𝐴 𝑇 𝐴𝑀−1).
acceleratable
65/81
66. • In mathematics, Poisson's equation is a partial differential equation of elliptic
type with broad utility in electrostatics, mechanical engineering and theoretical
physics.
• Poisson’s equation is frequently written as
Poisson’s Equation
http://en.wikipedia.org/wiki/Poisson's_equation
http://people.rit.edu/~pnveme/ExplictSolutions2/2Dim/Linear/PoissonDisk/PoissonDisk.html
66/81
67. • In two-dimensional Cartesian coordinates, it takes the form
Poisson’s Equation in 2-Dimensions
block tri-diagonal system
67/81
68. • Mathematical background
• Apply 2D forward FFT to 𝑓 to obtain 𝑓(𝑘), where 𝑘 is the wave number
• Apply the inverse of the Laplace operator to 𝑓(𝑘) to obtain 𝑣(𝑘): simple
element-wise division in Fourier space
• Apply 2D inverse FFT to 𝑣(𝑘) to obtain 𝑣
Poisson’s Equation using the FFT
𝑣 = −
𝑓
(𝑘 𝑥
2
+ 𝑘 𝑦
2
)
𝛻2
𝑣 = 𝑓 ↔ −(𝑘 𝑥
2
+ 𝑘 𝑦
2
)𝑣 = 𝑓
http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/3-CUDA_libraries_+_Matlab.pdf
68/81
69. • Pseudo codes for two iterative methods:
Split BregmanAlgorithm for Fused Lasso (1/2)
FFT
69/81
70. • Multi-GPU operations for matrix-vector
computations
Split BregmanAlgorithm for Fused Lasso (2/2)
70/81
71. • The computation times are measured inCPU time with
• CPU: Intel Xeon E5-4620 (2.2GHz) and 16GB RAM
• GPU: NVIDIAGTXTitan (2688 cores, 6GBGDDR5)
• We set the regularization parameters 𝜆1, 𝜆2 = 1,1 and stopping criterion is
• We generate 𝑛 samples from a 𝑝-dimensional 𝑁(0, 𝐼 𝑝) and the response
variable y is generated by using 𝑦 = 𝑋𝛽 + 𝜖 (𝑁(0, 𝐼 𝑛)) where 𝛽 = .
Experiments
71/81
72. • We first considered scenarios with synthetic regression problems where the
coefficients were defined on a square grid:
• For the very large cases, the average speed-up: 409.19 to 433.23
Runtime Comparison for PiecewiseConstant BlocksCases
72/81
73. • For the other cases (n = 12000–24000), the average speed-up: 26.67–47.47
• CircularGaussian cases are formulated by:
Runtime Comparison forCircularGaussian Cases
73/81
74. • Image-based regression of the behavioral fMRI data.
• Regression coefficients were overlaid and color-coded on the brain map as
described in the text.
Structured Sparsity Regression Example
74/81
75. • Image-based regression of the behavioral fMRI data.
• Regression coefficients were overlaid and color-coded on the brain map as
described in the text.
Structured Sparsity Regression Example
75/81
76. • By applying the proposed method to various large-scale datasets extensively,
we have demonstrated successfully the following:
• Feasibility of highly-parallelizable computational algorithms for high-
dimensional structured sparse regression problems,
• Use case of direct-communicating multiple GPUs for speed-up and
scalability,
• Promise of FFT-based preconditioners for parallel solving of a family of
linear systems.
• That the highest (433x) speed up occurred at the highest dimensional problems
clearly indicates where the merit of the multi-GPU scheme lies.
• Future work: connecting dots to deep neural networks
• FusedAutoencoder, Multi-layer fused Lasso, …
Summary ofTopic 3
76/81
77. • Achievements
• Preliminary
• Deep neural networks
• Dissertation overview
• Adversarial example handling
• Manifold regularized deep neural networks using adversarial examples
• Class-imbalance handling
• Boosted contrastive divergence
• Spatial dependency handling
• Structured sparsity via parallel fused Lasso
• Conclusion
• Limitations and future work
Outline
77/81
78. 1. The MRnet can be applied in a complementary way to generalize neural
networks with traditional techniques such as L2 decay.
2. We propose a novel method for training RBMs for class-imbalanced
prediction. Our proposal includes a deep belief network-based methodology
for computational splice junction prediction.
3. The parallel fused Lasso can be applied for data that have structured
sparsity like images to exploit more prior knowledge than convolutional or
recurrent operations.
Conclusion
This dissertation proposed a set of robust feature learning schemes
that can learn meaningful representation underlying in large-scale
genomic datasets and image datasets using deep networks.
1
2
3
78/81
79. • Several future work for the proposed methodologies can be possible.
• First, we can extend MRnet to extract scaling and translation invariant features
by replacing synthetic of nearest training samples.
• Second, it can be also interesting to alternate the objective function of MRnet
in order to generalize the whole procedure of MRnet.
• Lastly, the proposed three schemes (manifold loss, boosting, and L1 fusion
penalty) can be applied into the framework of recurrent neural networks.
Limitations and FutureWork
We need to make the proposed schemes
more universal and general.
79/81