deep learningFeature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham
2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation
3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias
4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4)
5 Article Objectives … we propose a supervised approach for task-aware selection of features using Deep Neural Networks (DNN) in the context of action recognition (e.g. walking, running, jumping). (1) … selected features are found to give better classification performance than the original high-dimensional features. (1) It is also shown that the classification performance of the proposed feature selection technique is superior to the low-dimensional representation obtained by principal component analysis (PCA). (1)
6 Methodology … analyze the contribution of each of the input dimensions to identify the features (inputs) important for classification (1) … to correctly analyze the contribution of an input feature, we study its activation potential (averaged over all training values of the input and hidden neurons) relative to the total activation potential (1) The higher the activation potential contribution of an input dimension, the more likely is its participation in hidden neuronal activity and consequently, classification. (1)Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham
2 Neural Networks Basics Perceptron Equivalent performance to least mean square algorithm (linear regression) Activation Function Sigmoid, Hyperbolic Tangent Multi Layer Perceptrons Chains of perceptrons, perform feature extraction Training the network Training set, validation set, generalization set Back propagation
3 Perceptron and Activation Function The basic building block of Neural Networks (1) Summation of weighted inputs Bias performs change in y-intercept Output is present when the activation Threshold is overcome Activation function must be differentiable w1 w3 w2 Activation function Input 1 Input 2 Input 3 output Bias
4 Multi-Layer Perceptrons and Training Classification with 20 node MLP NN (4) Feature extraction with 5 layered Convolutional Neural Network (2) Feature Extraction with MLP NN (4)
5 Article Objectives … we propose a supervised approach fo
1. CS294-129: Designing, Visualizing and
Understanding Deep Neural Networks
John Canny
Fall 2016
CN: 34652
With considerable help from Andrej
Karpathy, Fei-Fei Li, Justin Johnson, Ian
Goodfellow and several others
7. Sequence-to-sequence models with LSTMs and attention:
Source Luong, Cho, Manning ACL Tutorial 2016.
Milestones: Language Translation
8. Milestones: Deep Reinforcement Learning
In 2013, Deep Mind’s arcade player bests human expert
on six Atari Games. Acquired by Google in 2014,.
In 2016, Deep Mind’s
alphaGo defeats former
world champion Lee Sedol
8
11. Other Deep Learning courses at Berkeley:
Fall 2016:
CS294-43: Special Topics in Deep Learning: Trevor Darrell,
Alexei Efros, Marcus Rohrbach, Mondays 10-12 in Newton
room (250 SDH). Emphasis on computer vision research.
Readings, presentations.
CS294-YY: Special Topics in Deep Learning: Trevor
Darrell, Sergey Levine, and Dawn Song, Weds 10-12,
https://people.eecs.berkeley.edu/~dawnsong/cs294-dl.html
Emphasis on security and other emerging applications.
Readings, presentations, project.
11
12. Other Deep Learning courses at Berkeley:
Spring XX:
CS294-YY: Deep Reinforcement Learning, John Shulman
and Sergey Levine, http://rll.berkeley.edu/deeprlcourse/.
Should happen in Spring 2017. Broad/introductory
coverage, assignments, project.
Stat 212B: Topics in Deep Learning: Joan Bruna
(Statistics), last offered Spring 2016. Next ??
Paper reviews and course project.
12
13. Learning about Deep Neural Networks
Yann Lecun quote: DNNs require: “an interplay between
intuitive insights, theoretical modeling, practical
implementations, empirical studies, and scientific analyses”
i.e. there isn’t a framework or core set of principles to
explain everything (c.f. graphical models for machine
learning).
We try to cover the ground in Lecun’s quote.
13
14. This Course (please interrupt with questions)
Goals:
• Introduce deep learning to a broad audience.
• Review principles and techniques (incl. visualization) for
understanding deep networks.
• Develop skill at designing networks for applications.
Materials:
• Book(s)
• Notes
• Lectures
• Visualizations
14
15. The role of Animation
From A. Karpathy’s cs231n notes.
15
16. This Course
Work:
• Class Participation: 10%
• Midterm: 20%
• Final Project (in groups): 40%
• Assignments : 30%
Audience: EECS grads and undergrads.
In the long run will probably be “normalized” to a 100/200-
course numbered “mezzanine” offering.
16
17. Prerequisites
• Knowledge of calculus and linear algebra, Math 53/54 or
equivalent.
• Probability and Statistics, CS70 or Stat 134. CS70 is bare
minimum preparation, a stat course is better.
• Machine Learning, CS189.
• Programming, CS61B or equivalent. Assignments will
mostly use Python.
17
18. Logistics
• Course Number: CS 294-129 Fall 2016, UC Berkeley
• On bCourses, publicly readable.
• CCN: 34652
• Instructor: John Canny lastname@berkeley.edu
• Time: MW 1pm - 2:30pm
• Location: 306 Soda Hall
• Discussion: Join Piazza for announcements and to ask
questions about the course
• Office hours:
• John Canny - M 2:30-3:30, in 637 Soda
18
19. • More info later
• Can be combined with other course projects
• Encourage “open-source” projects that can be archived
somewhere.
Course Project
22. • Reading: the Deep Learning Book, Introduction
Some History
23. • 1940s-1960s: Cybernetics: Brain like electronic systems,
morphed into modern control theory and signal processing.
• 1960s-1980s: Digital computers, automata theory,
computational complexity theory: simple shallow circuits are
very limited…
• 1980s-1990s: Connectionism: complex, non-linear networks,
back-propagation.
• 1990s-2010s: Computational learning theory, graphical
models: Learning is computationally hard, simple shallow
circuits are very limited…
• 2006: Deep learning: End-to-end training, large datasets,
explosion in applications.
Phases of Neural Network Research
24. • Recall the LeNet was a modern visual classification network
that recognized digits for zip codes. Its citations look like this:
• The 2000s were a golden age for machine learning, and
marked the ascent of graphical models. But not so for neural
networks.
Citations of the “LeNet” paper
Second phase Third phase
Deep Learning “Winter”
25. • From both complexity and learning theory perspectives,
simple networks are very limited.
• Can’t compute parity with a small network.
• NP-Hard to learn “simple” functions like 3SAT formulae,
and i.e. training a DNN is NP-hard.
Why the success of DNNs is surprising
26. • The most successful DNN training algorithm is a version of
gradient descent which will only find local optima. In other
words, it’s a greedy algorithm. Backprop:
loss = f(g(h(y)))
d loss/dy = f’(g) x g’(h) x h’(y)
• Greedy algorithms are even more limited in what they can
represent and how well they learn.
• If a problem has a greedy solution, its regarded as an “easy”
problem.
Why the success of DNNs is surprising
27. • In graphical models, values in a network represent random
variables, and have a clear meaning. The network structure
encodes dependency information, i.e. you can represent rich
models.
• In a DNN, node activations encode nothing in particular, and
the network structure only encodes (trivially) how they derive
from each other.
Why the success of DNNs is surprising
28. Why the success of DNNs is surprising obvious
• Hierarchical representations are ubiquitous in AI. Computer
vision:
29. Why the success of DNNs is surprising obvious
• Natural language:
30. Why the success of DNNs is surprising obvious
• Human Learning: is deeply layered.
Deep expertise
31. Why the success of DNNs is surprising obvious
• What about greedy optimization?
• Less obvious, but it looks like many learning problems (e.g.
image classification) are actually “easy” i.e. have reliable
steepest descent paths to a good model.
Ian Goodfellow – ICLR 2015 Tutorial
Editor's Notes
Around 5% error rate. Good enough to be useful.
Founders: Demis Hassabis, Shane Legg, Mustafa Suleyman
Why was this milestone particularly worrying from an ethics point of view?