These slides accompanied a demo of Deeplearning4j at the SF Data Mining Meetup hosted by Trulia.
http://www.meetup.com/Data-Mining/events/212445872/
Deep-learning is useful in detecting identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; and recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Finally, Deeplearning4j integrates with GPUs. A stable version was released in October.
1. Deep Learning
{
Machine Perception and Its Applications
Adam Gibson // deeplearning4j.org // skymind.io // zipfian
2. DL, a Subset of AI
Deep Learning = subset of Machine Learning
Machine Learning = subset of AI
AI = Algorithms that repeatedly optimize
themselves.
Deep learning = pattern recognition
Machines classify data and improve over
time.
3. Why Is DL Hard?
We see this… Machines see this… (Where’s the cat?)
(Hat tip to Andrew
Ng)
4. What Can It Handle?
Anything digitized
Raw media: MP3’s, JPEG’s, text, video
Sensor output: temperature, pressure,
motion and chemical composition
Time-series data: Prices and their
movement; e.g. the stock market, real
estate, weather and economic indicators
It’s setting new accuracy records
everywhere
5. What’s It Good For?
Recommendation engines: Anticipate what
you will buy or click.
Anomaly detection: Bad outcomes signal
themselves in advance: fraud in e-commerce;
tumors in X-rays; loans likely to
default.
Signal processing: Deep learning can
estimate customer lifetime value, necessary
inventory or an approaching market crash.
Facial and image recognition
8. How Did It Do That?
Nets need training data.
You know what training sets contain.
Nets learn training-set faces by repeated
reconstruction.
Reconstruction = finding which facial
features are indicative of larger forms.
When a net can rebuild the training set, it is
ready to work with unsupervised data.
9. Technical Explanation
Nets measure the difference between their
results and a benchmark = loss function
They minimize differences with an
optimization function.
They optimize by altering their parameters
and testing how changes affect results.
Gradient descent, Conjugate gradient, L-BFGS
11. Representation Learning
Through pre-training, nets learn to locate
signal in a world of noise
Generic priors initiate weights
Reconstructions = representations
Feature hierarchies intuition about
complex, abstract features
12. Facial Recognition’s Uses
Facebook engages us more. (95-97%
accuracy)
Government agencies identify persons of
interest.
Video game makers build more realistic
(and stickier) worlds.
Stores identify customers and track
behavior, prevent churn and encourage
spending.
13. Sentiment Analysis & Text
Sentiment analysis ~ NLP
Software classifies sentences by emotional
tone, bias and intensity
Positive or negative - object-specific…
Rank movies, books, consumer goods,
politicians, celebrities
Predict social unrest, gauge reputations, PR…
15. Deep-Belief Net (DBN)
A stack of RBMs.
1st RBM’s hidden layer -> 2nd RBM’s input layer
Feature hierarchy
A DBN classifies data.
Buckets images: e.g. sunset, elephant, flower.
Useful in search.
16. Deep Autoencoder
Two DBNs.
The first DBN encodes data into vector of
10-30 numbers.
The second DBN decodes data back to
original state.
Reduce any document/image to highly
compact vector.
QA and information retrieval: Watson
19. Convolutional Net
Good with images.
ConvNets learn data like images in
patches.
Each piece learned is then woven
together in the whole.
Yann LeCun’s baby, now at Facebook.
20. Recursive Neural Tensor Net
Top-down, hierarchical nets rather than feed-forward
like DBNs.
Sequence-based classification, windows of
several events, entire scenes rather than
images.
Features = vectors.
A tensor = multi-dimensional matrix, or
multiple matrices of the same size.
If there’s one word you remember from tonight, make it REPRESENTATION LEARNING.
2006 – Geoff Hinton – the deep-learning comeback
RBMs are not deep, but they’re components of deep nets.
2 layers of neuron-like nodes.
One is the visible/input layer
Two is hidden layer, which identifies features
Symmetrically connected.
“Restricted” means there are no visible-visible or hidden-hidden connections; i.e. all connections happen between layers.
Are a step beyond search engines…
Classify the question type
Several data sources can answer different kinds of questions
Compiles a list of QA candidates, or “documents most likely to succeed”.