This document provides an overview of deep learning algorithms, including deep neural networks, convolutional neural networks, deep belief networks, and restricted Boltzmann machines. It discusses key concepts such as learning in deep neural networks, the evolution timeline of deep learning approaches, deep architectures, and restricted Boltzmann machines. It also covers training restricted Boltzmann machines using contrastive divergence, constructing deep belief networks by stacking restricted Boltzmann machines, and practical considerations for pre-training and fine-tuning deep belief networks.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
In this paper a new form of the Hosszu-Gluskin theorem is presented in terms of polyadic powers and using the language of diagrams. It is shown that the Hosszu-Gluskin chain formula is not unique and can be generalized ("deformed") using a parameter q which takes special integer values. A version of the "q-deformed" analog of the Hosszu-Gluskin theorem in the form of an invariance is formulated, and some examples are considered. The "q-deformed" homomorphism theorem is also given.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
In this paper a new form of the Hosszu-Gluskin theorem is presented in terms of polyadic powers and using the language of diagrams. It is shown that the Hosszu-Gluskin chain formula is not unique and can be generalized ("deformed") using a parameter q which takes special integer values. A version of the "q-deformed" analog of the Hosszu-Gluskin theorem in the form of an invariance is formulated, and some examples are considered. The "q-deformed" homomorphism theorem is also given.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
A new look on performance of small-cell network with design of multiple anten...journalBEEI
A downlink of small-cell network is studied in this paper studies in term of outage performance. We benefit by design of multiple antennas at the base station and fullduplex transmission mode. The scenario of multiple surrounded small-cell networks is considered to look the impact of interference. We derive the closed-form expression of outage probability to show performance of mobile user. We investigate target rate is main factor affecting to outage performance. According to the considered system, simulation results indicate reasonable value of outage probability and throughput as well. Finally, Monte-Carlo simulation method is deployed to determine exactness of main results found in this article. Finally, the considered system can exhibit improved performance if controlling interference term.
A new form of the Hosszu-Gluskin theorem is presented in terms of polyadic powers and using the language of diagrams. It is shown that the Hosszu-Gluskin chain formula is not unique and can be generalized (“deformed”) using a parameter q which takes special integer values. A version of the
“q-deformed” analog of the Hosszu-Gluskin theorem in the form of an invariance is formulated, and some examples are considered. The “q-deformed” homomorphism theorem is also given.
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Gurbinder Gill
In this presentation we propose the parallel implementation of template matching using Full Search using NCC as a measure using the concept of pre-computed sum-tables referred to as FNCC for high resolution images on NVIDIA’s Graphics Processing Units (GP-GPU’s)
WEBINAR ON FUNDAMENTALS OF DIGITAL IMAGE PROCESSING DURING COVID LOCK DOWN by by K.Vijay Anand , Associate Professor, Department of Electronics and Instrumentation Engineering , R.M.K Engineering College, Tamil Nadu , India
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
A Literature Survey on Mobile-Learning Management SystemsAM Publications
In today’s digital era, due to development in wireless communication technology, proliferation of electronics gazettes usage especially smartphones is at peak point across the globe. Without smartphone, most of the individuals feel inconvenience in their daily routine India is not an exception for it, today; India has the second-biggest smartphone market in terms of active unique smartphone users, which crossed 220 million users. These innovations and developments in mobile technologies have an impact on education and learning systems which in turn resulted into the potential to develop an education system that enables individuals and groups to learn bypassing the time and place constraints. This paper gives a glimpse on characteristics, elements, security risks, design issues and challenges of mobile learning management system.
A new look on performance of small-cell network with design of multiple anten...journalBEEI
A downlink of small-cell network is studied in this paper studies in term of outage performance. We benefit by design of multiple antennas at the base station and fullduplex transmission mode. The scenario of multiple surrounded small-cell networks is considered to look the impact of interference. We derive the closed-form expression of outage probability to show performance of mobile user. We investigate target rate is main factor affecting to outage performance. According to the considered system, simulation results indicate reasonable value of outage probability and throughput as well. Finally, Monte-Carlo simulation method is deployed to determine exactness of main results found in this article. Finally, the considered system can exhibit improved performance if controlling interference term.
A new form of the Hosszu-Gluskin theorem is presented in terms of polyadic powers and using the language of diagrams. It is shown that the Hosszu-Gluskin chain formula is not unique and can be generalized (“deformed”) using a parameter q which takes special integer values. A version of the
“q-deformed” analog of the Hosszu-Gluskin theorem in the form of an invariance is formulated, and some examples are considered. The “q-deformed” homomorphism theorem is also given.
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Gurbinder Gill
In this presentation we propose the parallel implementation of template matching using Full Search using NCC as a measure using the concept of pre-computed sum-tables referred to as FNCC for high resolution images on NVIDIA’s Graphics Processing Units (GP-GPU’s)
WEBINAR ON FUNDAMENTALS OF DIGITAL IMAGE PROCESSING DURING COVID LOCK DOWN by by K.Vijay Anand , Associate Professor, Department of Electronics and Instrumentation Engineering , R.M.K Engineering College, Tamil Nadu , India
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
A Literature Survey on Mobile-Learning Management SystemsAM Publications
In today’s digital era, due to development in wireless communication technology, proliferation of electronics gazettes usage especially smartphones is at peak point across the globe. Without smartphone, most of the individuals feel inconvenience in their daily routine India is not an exception for it, today; India has the second-biggest smartphone market in terms of active unique smartphone users, which crossed 220 million users. These innovations and developments in mobile technologies have an impact on education and learning systems which in turn resulted into the potential to develop an education system that enables individuals and groups to learn bypassing the time and place constraints. This paper gives a glimpse on characteristics, elements, security risks, design issues and challenges of mobile learning management system.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
In the last couple of years, deep learning techniques have transformed the world of artificial intelligence. One by one, the abilities and techniques that humans once imagined were uniquely our own have begun to fall to the onslaught of ever more powerful machines. Deep neural networks are now better than humans at tasks such as face recognition and object recognition. They’ve mastered the ancient game of Go and thrashed the best human players. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new hype? How is Deep Learning different from previous approaches? Let’s look behind the curtain and unravel the reality. This talk will introduce the core concept of deep learning, explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why “deep learning is probably one of the most exciting things that is happening in the computer industry“ (Jen-Hsun Huang – CEO NVIDIA).
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS Academy
This SlideShare by Prof. Dinabandhu contains
Examples of classification and estimation
ANN- Architecture and back Propagation
Classification and Estimation
Some Industrial Problems
Observation
Discussion and Q&A
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
Object recognition from RGB-D sensors has recently emerged as a renowned and challenging research topic. The current systems often require large amounts of time to train the models and to classify new data. We proposed an effective and fast object recognition approach from 3D data acquired from depth sensors such as Structure or Kinect sensors.
Our contribution in this work} is to present a novel fast and effective approach for real-time object recognition from 3D depth data:
- First, we extract simple but effective frame-level features, which we name as differential frames, from the raw depth data.
- Second, we build a recognition system based on Extreme Learning Machine classifier with a Local Receptive Field (ELM-LRF).
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
Workshop: Numerical Analysis of Stochastic Partial Differential Equations (NASPDE), in Network Eurandom at Eindhoven University of Technology, May 16, 2023, about my recent works (i) "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://doi.org/10.1080/14697688.2022.2135455), and (ii) "Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities" (link: https://arxiv.org/abs/2003.05708).
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...csandit
Single-channel speech intelligibility enhancement is much more difficult than multi-channel
intelligibility enhancement. It has recently been reported that machine learning training-based
single-channel speech intelligibility enhancement algorithms perform better than traditional
algorithms. In this paper, the performance of a deep neural network method using a multiresolution
cochlea-gram feature set recently proposed to perform single-channel speech
intelligibility enhancement processing is evaluated. Various conditions such as different
speakers for training and testing as well as different noise conditions are tested. Simulations
and objective test results show that the method performs better than another deep neural
networks setup recently proposed for the same task, and leads to a more robust convergence
compared to a recently proposed Gaussian mixture model approach.
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개할 논문은 3D관련 업무를 진행 하시는/ 희망하시는 분들의 필수 논문인 VoxelNET 입니다.
발표자료:https://www.slideshare.net/taeseonryu/mcsemultimodal-contrastive-learning-of-sentence-embeddings
안녕하세요! 딥러닝 논문읽기 모임입니다.
오늘은 자율 주행, 가정용 로봇, 증강/가상 현실과 같은 다양한 응용 분야에서 중요한 문제인 3D 포인트 클라우드에서의 객체 탐지에 대한 획기적인 진전을 소개하고자 합니다. 이를 위해 'VoxelNet'이라는 새로운 3D 탐지 네트워크에 대해 알아보겠습니다.
1. 기존 방법의 한계
기존의 많은 노력은 수동으로 만들어진 특징 표현, 예를 들어 새의 눈 시점 투영 등에 집중해 왔습니다. 하지만 이러한 방법들은 LiDAR 포인트 클라우드와 영역 제안 네트워크(RPN) 사이의 연결을 효과적으로 수행하기 어렵습니다.
2. VoxelNet의 혁신적 접근법
VoxelNet은 3D 포인트 클라우드를 위한 수동 특징 공학의 필요성을 없애고, 특징 추출과 바운딩 박스 예측을 단일 단계, end-to-end 학습 가능한 깊은 네트워크로 통합합니다. VoxelNet은 포인트 클라우드를 균일하게 배치된 3D 복셀로 나누고, 새롭게 도입된 복셀 특징 인코딩(VFE) 레이어를 통해 각 복셀 내의 포인트 그룹을 통합된 특징 표현으로 변환합니다.
3. 효과적인 기하학적 표현 학습
이 방식을 통해 포인트 클라우드는 서술적인 체적 표현으로 인코딩되며, 이는 RPN에 연결되어 탐지를 생성합니다. VoxelNet은 다양한 기하학적 구조를 가진 객체의 효과적인 구별 가능한 표현을 학습합니다.
4. 성능 평가
KITTI 자동차 탐지 벤치마크에서의 실험 결과, VoxelNet은 기존의 LiDAR 기반 3D 탐지 방법들을 큰 차이로 능가했습니다. 또한, LiDAR만을 기반으로 한 보행자와 자전거 탐지에서도 희망적인 결과를 보였습니다.
VoxelNet의 도입은 3D 포인트 클라우드에서의 객체 탐지를 혁신적으로 개선하고 있으며, 이 분야에서의 미래 발전에 중요한 영향을 미칠 것으로 기대됩니다.
오늘 논문 리뷰를 위해 이미지처리 허정원님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/yCgsCyoJoMg
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Predicting online user behaviour using deep learning algorithmsArmando Vieira
We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.
We propose an algorithm for training Multi Layer Preceptrons for classification problems, that we named Hidden Layer Learning Vector Quantization (H-LVQ). It consists of applying Learning Vector Quantization to the last hidden layer of a MLP and it gave very successful results on problems containing a large number of correlated inputs. It was applied with excellent results on classification of Rurtherford
backscattering spectra and on a benchmark problem of image recognition. It may also be used for efficient feature extraction.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1. deep learning
Algorithms and Applications
Bernardete Ribeiro, bribeiro@dei.uc.pt
University of Coimbra, Portugal
INIT/AERFAI Summer School on Machine Learning, Benicassim 22-26 June 2015
6. learning in deep neural networks
1. No general learning algorithm (no-free lunch theorem by
Wolpert 1996)
2. Learning algorithm for specific tasks - perception, control,
prediction, planning reasoning, language understanding
3. Limitations of BP - local minima, optimization challenges
for non-convex objective functions
4. Hinton’s deep belief networks (DBNs) as stack of RBMs
5. LeCun’s energy based learning for DBNs
5
9. from brain-like computing to deep learning
∙ New empirical and theoretical results have brought deep
architectures into the focus of the Machine Learning (ML)
researchers [Larochelle et al., 2007].
∙ Theoretical results suggest that deep architectures are
fundamental to learn the kind of brain-like complicated
functions that can represent high-level abstractions (e.g.
vision, speech, language) [Bengio, 2009]
8
11. deep neural networks
∙ Convolutional Neural Networks (CNNs) [LeCun et al., 1989]
∙ Deep Belief Networks (DBNs) [Hinton et al, 2006]
∙ AutoEncoders (AEs) [Bengio et al, NIPS 2006]
∙ Sparse Autoencoders [Ranzato et al, NIPS’2006]
10
12. convolutional neural networks (cnns)
∙ Convolutional Neural Network consists of two basic
operations
∙ convolutional
∙ pooling
∙ Convolutional and pooling layers
are arranged alternately until
high-level features are obtained
∙ Several feature maps in each
convolutional layer
∙ Weights in the same map are
shared
NN
input C1 S2 C3 S4
1
1
I Arel, D Rose & T Karnowski, Deep Machine Learning—A New Frontier in Artificial Intelligence Research, IEEE,
CIM,2010
11
13. convolutional neural networks (cnns)
∙ Convolutional: suppose the size of the layer is d × d
and the size of the receptive fields are r × r, γ and x
denote respectively the values of the convolutional
layer and the previous layer:
γij = g(
r
m=1
r
n=1
xi+m−1,j+n−1.wm,n + b)
i, j = 1, · · · , (d − r + 1) where g is a nonlinear function.
∙ Pooling is following after convolution to reduce the
dimensionality of features and to introduce
translational invariance into the CNN network.
12
14. deep belief networks (dbns)
∙ Probabilistic generative models
contrasting with the discriminative
nature of other NNS
∙ Generative models provide a joint
probability distribution of data
and labels
∙ Unsupervised greedy-layer-wise
pre-training followed by final
tuning
image 28 x 28 pixels
visible
hidden
visible
hidden
visible
hidden
Top Level units
Labels Hidden Units
RBM Layer
RBM Layer
RBM Layer
Detection Layer
2
2
based on I Arel, D Rose & T Karnowski, Deep Machine Learning—A New Frontier in Artificial Intelligence
Research, IEEE, CIM,2010
13
15. autoencoders (aes)
∙ The auto-encoder has two
components:
∙ the encoder f (mapping x to h) and
∙ the decoder g (mapping h to r)
∙ An auto-encoder is a neural
network that tries to reconstruct
its input to its output
encoder f
…
…
…
…
…
…
decoder g
input x
code h
reconstruction r
3
3
based on Y Bengio, I Goodfellow and A Courville, Deep Learning, An MIT Press book (in preparation),
www.iro.umontreal.ca_~bengioy_dbook
14
16. deep architectures versus shallow architectures
∙ Deep architectures can be exponentially more efficient
than shallow architectures [Roux and Bengio, 2010].
∙ Functions that can be compactly represented with a Neural
Network (NN) of depth d, may require an exponential number
of computational elements for a network with depth d − 1
[Bengio, 2009].
15
17. deep architectures versus shallow architectures
∙ Deep architectures can be exponentially more efficient
than shallow architectures [Roux and Bengio, 2010].
∙ Functions that can be compactly represented with a Neural
Network (NN) of depth d, may require an exponential number
of computational elements for a network with depth d − 1
[Bengio, 2009].
∙ Since the number of computational elements depends on
the number of training samples available, using shallow
architectures may result in poor generalization
models [Bengio, 2009].
∙ As a result, deep architecture models tend to outperform
shallow models such as SVMs [Larochelle et al., 2007].
15
20. restricted boltzmann machines (rbms)
h1 h2 h3 · · · hj · · · hJ 1
bias
v1 v2 · · · vi · · · vI 1
bias
visible units
hidden units
decoder
encoder
18
21. restricted boltzmann machines (rbms)
∙ Unsupervised
∙ Find complex regularities in
training data
∙ Bipartite Graph
∙ visible, hidden layer
∙ Binary stochastic units
∙ On/Off with probability
∙ 1 Iteration
∙ Update Hidden Units
∙ Reconstruct Visible Units
∙ Maximum Likelihood of
training data
h1 h2 h3 · · · hj · · · hJ 1
bias
v1 v2 · · · vi · · · vI 1
bias
visible units
hidden units
encoder
19
22. restricted boltzmann machines (rbms)
∙ Training Goal: Best probable
reproduction
∙ unsupervised data
∙ find latent factors of data
set
∙ Adjust weights to get
maximum probability of
input data
h1 h2 h3 · · · hj · · · hJ 1
bias
v1 v2 · · · vi · · · vI 1
bias
visible units
hidden units
encoder
20
23. restricted boltzmann machines (rbms)
Given an observed state, the energy of the joint configuration
of the visible units and hidden units (v, h) is given by:
E(v, h) = −
I
i=1
civi −
J
j=1
bjhj −
J
j=1
I
i=1
Wjivihj , (1)
where W is the matrix of weights, and b and c are the bias
units w.r.t. hidden and visible layers, respectively.
h1 h2 h3 · · · hj · · · hJ 1
bias
v1 v2 · · · vi · · · vI 1
bias
visible units
hidden units
decoder
encoder
21
24. restricted boltzmann machines (rbms)
The Restricted Boltzmann Machine (RBM) assigns a
probability for each configuration (v, h), using:
p(v, h) =
e−E(v,h)
Z
, (2)
where Z is a normalization constant called partition function,
obtained by summing up the energy of all possible (v, h)
configurations [Bengio, 2009, Hinton, 2010,
Carreira-Perpiñán and Hinton, 2005]:
Z =
v,h
e−E(v,h)
. (3)
22
25. restricted boltzmann machines (rbms)
Since there are no connections between any two units within
the same layer, given a particular random input
configuration, v, all the hidden units are independent of each
other and the probability of h given v becomes:
p(h | v) =
j
p(hj = 1 | v) , (4)
where
p(hj = 1 | v) = σ(bj +
I
i=1
viWji) . (5)
23
26. restricted boltzmann machines (rbms)
Similarly given a specific hidden state, h, the probability of v
given h is obtained by (6):
p(v | h) =
i
p(vi = 1 | h) , (6)
where:
p(vi = 1 | h) = σ(ci +
J
j=1
hjWji) . (7)
24
27. restricted boltzmann machines (rbms)
Given a random training vector v, the state of a given hidden
unit j is set to 1 with probability:
p(hj = 1|v) = σ(bj +
i
viWij)
Similarly:
p(vi = 1|h) = σ(ci +
j
hjWij)
where σ (x) is the sigmoid squashing function 1
(1+e−x)
.
25
28. restricted boltzmann machines (rbms)
The marginal probability assigned to a visible vector, v, is
given by (8):
p(v) =
h
p(v, h) =
1
Z
h
e−E(v,h)
. (8)
Hence, given a specific training vector v its probability can be
raised by adjusting the weights and the biases in order to
lower the energy of that particular vector while raising the
energy of all the others.
26
29. restricted boltzmann machines (rbms)
To this end, we can perform stochastic gradient ascent
procedure on the log-likelihood obtained from training the
data vectors using ( 9):
∂ log p(v)
∂θ
= −
h
p(h | v)∂
E(v, h)
∂θ
positive phase
+
v,h
p(v, h)
∂E(v, h)
∂θ
negative phase
(9)
27
31. training an rbm
The learning rule for performing stochastic steepest ascent in
the log probability of the training data:
∂ log p(v)
∂θ
= vihj 0
− vihj ∞
(10)
where · 0 denotes expectations for the data distribution
(p0 = p(h | v)) and · ∞ denotes expectations under the
model distribution
p∞(v, h) = p(v, h) [Roux and Bengio, 2008].
h1 h2 h3 · · · hj · · · hJ 1
bias
v1 v2 · · · vi · · · vI 1
bias
visible units
hidden units
decoder
encoder
29
32. mcmc using alternating gibbs sampling
v(0) = x
i · · ·
h(0)
· · · j
vihj 0
p(hj = 1|v) = σ(bj + I
i=1 viWji)
30
33. mcmc using alternating gibbs sampling
v(0) = x
i · · ·
h(0)
· · · j
vihj 0
v(1)
i · · ·
p(vi = 1|h) = σ(ci + J
j=1
hjWji)
31
34. mcmc using alternating gibbs sampling
v(0) = x
i · · ·
h(0)
· · · j
vihj 0
v(1)
i · · ·
h(1)
· · · j
p(hj = 1|v) = σ(bj + I
i=1 viWji)
32
35. mcmc using alternating gibbs sampling
v(0) = x
i · · ·
h(0)
· · · j
vihj 0
v(1)
i · · ·
h(1)
· · · j
v(1)
i · · ·
p(vi = 1|h) = σ(ci + J
j=1
hjWji)
33
36. mcmc using alternating gibbs sampling
v(0) = x
i · · ·
h(0)
· · · j
vihj 0
v(1)
i · · ·
h(1)
· · · j
v(2)
i · · ·
h(2)
· · · j
v(∞)
i · · ·
h(∞)
· · · j
vihj ∞
34
38. contrastive divergence (cd–k)
∙ To solve this problem, Hinton proposed the Contrastive
Divergence algorithm.
∙ CD–k replaces . ∞ by · k for small values of k.
∆Wji = η( vihj 0
− vihj k
) (11)
36
39. contrastive divergence (cd–k)
∙ v(0) ← x
∙ Compute the binary (features) states of the hidden units,
h(0), using v(0)
∙ for n ← 1 to k
∙ Compute the “reconstruction” states for the visible units, v(n)
,
using h(n−1)
∙ Compute the “reconstruction” states for the hidden units, h(n)
,
using v(n)
∙ end for
∙ Update the weights and biases, according to:
∆Wji = η( vihj 0
− vihj k
) (12)
∆bj = η( hj 0
− hj k
) (13)
∆ci = η( vi 0 − vi k) (14)
37
42. deep belief networks (dbns)
∙ Start with a training vector
on the visible units
∙ Update all the hidden units
in parallel
∙ Update the all the visible
units in parallel to get a
“reconstruction”
∙ Update the hidden units
again
x· · ·
h1· · ·
p(x|h1)p(h1|x)
x· · ·
h1· · ·
h2· · ·
p(x|h1)p(h1|x)
p(h1|h2)p(h2|h1)
x· · ·
h1· · ·
h2· · ·
h3· · ·
p(x|h1)p(h1|x)
p(h1|h2)p(h2|h1)
p(h2|h3)p(h3|h2)
40
43. pre-training and fine tuning
RBM
data
500 hidden units
RBM
300 hidden units
500 hidden units
RBM
100 hidden units
300 hidden units
RBM
100 hidden units
10 hidden
data
update weights
500 hidden units
300 hidden units
100 hidden units
10 hidden
error < 0.001
BP
DBN Model
RBMs pre-training fine-tuning with BP
41
53. deep models characteristics
∙ Biological Plausibility
∙ DBNs are effective in a wide range of ML problems.
∙ Creating a Deep Belief Network (DBN) model is a time
consuming and computationally expensive task that
involves training several Restricted Boltzmann Machines
(RBMs) upholding considerable efforts.
49
54. deep models characteristics
∙ Biological Plausibility
∙ DBNs are effective in a wide range of ML problems.
∙ Creating a Deep Belief Network (DBN) model is a time
consuming and computationally expensive task that
involves training several Restricted Boltzmann Machines
(RBMs) upholding considerable efforts.
∙ The adaptive step-size procedure for tuning the learning
rate has been incorporated in the learning model with
excelling results.
49
55. deep models characteristics
∙ Biological Plausibility
∙ DBNs are effective in a wide range of ML problems.
∙ Creating a Deep Belief Network (DBN) model is a time
consuming and computationally expensive task that
involves training several Restricted Boltzmann Machines
(RBMs) upholding considerable efforts.
∙ The adaptive step-size procedure for tuning the learning
rate has been incorporated in the learning model with
excelling results.
∙ Graphics Processing Units (GPU) can reduce significantly
the convergence time for the data intensive tasks in DBNs
49
56. Bengio, Y. (2009).
Learning deep architectures for AI.
Foundations and Trends in Machine Learning, 2(1):1–127.
Carreira-Perpiñán, M. A. and Hinton, G. E. (2005).
On contrastive divergence learning.
In Proceedings of the 10th International Workshop on
Artificial Intelligence and Statistics (AISTATS 2005), pages
33–40.
Hinton, G. E. (2010).
A practical guide to training restricted Boltzmann
machines.
Technical report, Department of Computer Science,
University of Toronto.
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., and
Bengio, Y. (2007).
49
57. An empirical evaluation of deep architectures on
problems with many factors of variation.
In Proceedings of the 24th international conference on
Machine learning (ICML 2007), pages 473–480. ACM.
Roux, N. L. and Bengio, Y. (2008).
Representational power of restricted Boltzmann
machines and deep belief networks.
Neural Computation, 20(6):1631–1649.
Roux, N. L. and Bengio, Y. (2010).
Deep belief networks are compact universal
approximators.
Neural Computation, 22(8):2192–2207.
50
59. deep learning
Algorithms and Applications
Bernardete Ribeiro, bribeiro@dei.uc.pt
June 24, 2015
University of Coimbra, Portugal
INIT/AERFAI Summer School on Machine Learning, Benicassim 22-26 June 2015