Multimedia data mining using deep learning

Multimedia Data Mining
using deep learning
Peter Wlodarczak
wlodarczak@gmail.com

Agenda
 Aims
 Multimedia Data Mining
 Artificial Neural Networks
 Deep learning
 Challenges
 Discussion

Aims
 Analyze multimedia data for:
 Object/face recognition
 Voice commands
 Natural Language Processing
 Classification
 Automatic caption generation
 Record linkage (entity resolution)

Multimedia Data Mining I
 Multimedia data mining:
 Unprecedented amount of Multimedia data
since Web 2.0 and Social Media
 Prosumer data
 Uses algorithms to extract useful patterns
and relations from image, audio and video
data
 Traditional methods often not satisfactory
 Unsuitable for high dimensionality

Multimedia Data Mining II
 Multimedia data mining has been
improved using deep learning in:
 Visual data mining
 Natural Language Processing
 Deep learner are:
 Machine Learning schemes
 Usually multi-layered artificial neural
networks

Artificial Neural Networks I
 Artificial Neural Networks:
 Suitable to give good approximations for
complex problems
 Consist of perceptrons, neurons,
and weighted connections,
the axons

Artificial Neural Networks II
 Perceptron (Neuron)
 Linear classifier
 Data linearly separable using a hyperplane
 Where w = weights, a = real-valued vector,
feature vector, a0 = bias
 Binary classifier f(a) that maps its input
vector a to a single, binary output value
w0a0 + w1a1 + w2a2 + … + wkak = 0

Artificial Neural Networks III
w0
1
bias
attr
a1
attr
a2
attr
a3
w1 w2
w3
f(a) = kwkak + b
f(a) > 0 or
f(a) < 0

Artificial Neural Networks III
Training data
sex mask cape tie ears smokes class
Batman male yes yes no yes no Good
Robin male yes yes no no no Good
Alfred male no no yes no no Good
Penguin male no no yes no yes Bad
Catwoman female yes no no yes no Bad
Joker male no no no no no Bad
Test data
Batgirl female yes yes no yes no ?
Riddler male yes no no no no ?
 Supervised learning

Artificial Neural Networks IV
 Not all data is linearly separable

Artificial Neural Networks V
 Multilayer Perceptron
 Perceptrons organized in several layers
 A layer is fully interconnected with the next
layer
 All nodes except input node are perceptrons
 Feedforward neural network
 Uses backpropagation for training
 Error propagated back to minimize loss function

Artificial Neural Networks VI
 Multilayer perceptron can be used for
non-linear, multiclass classification

Artificial Neural Networks VII
 Gradient descent optimization method
for learning weights

Artificial Neural Networks VIII
 Complexity has to be accurate
(Occam’s razor)
Schapire 2004

Artificial Neural Networks IX
Schapire 2004

Artificial Neural Networks X
 For building an accurate classifier:
 Enough training examples
 Good performance on training set
 Classifier that is not too complex,
overfitting
 Allows to get approximate solutions for
very complex problems
 Support Vector Machines (SVM) are a
much simpler alternative to ANN

Deep learning I
 Deep learning
 No clear distinction to shallow learner
 Multiple layers of non-linear processing
units
 Each layer represents features at a higher
level
 Forms a hierarchical representation
 Majority of deep learners are aNN

Deep learning II
 Deep learning neural networks
 Uses Rectified Linear Unit (ReLU)
 Learn faster
 Half-wave rectifier
f(z) = max(z, 0)
 Use backpropagation for adjusting the
weights

Deep learning III - ConvNet
LeNet 2015

Deep learning IV - ConvNet
 Convolutional neural networks
 Inspired by the animal visual cortex
 Visual cortex is the most powerful visual
processing system in existence
 Typically two stages:
 Convolutional stage
 Pooling stage
 Characterized by
 sparse connectivity
 shared weights

Deep learning V - ConvNet
 Shared weights
 Subsets share weights and bias to form
feature map
 Replicated across entire visual field

Deep learning VI - ConvNet
 Each layer accepts 3D input vector and
transforms it into a 3D output vector
 Filters activate when specific feature is
mapped
CS231n 2015

Deep learning VII - ConvNet
 Receptive field spans all feature maps
LeNet 2015

Deep learning VIII - ConvNet
 MaxPooling
 Non-linear down-sampling
 Partitions input into non-overlapping
rectangles
 Outputs maximum value for each sub-
region
 Minimizes computation for next layer
 Reduces dimensionality of intermediate
representations

Deep learning IX - ConvNet
 Convolutional and sampling sublayers
UFLDL 2015

Deep learning X - ConvNet
 Image cascading max-pooling with
convolutionary layer
 Similar to edge detector

Deep learning XI - RNN
 Recurrent neural networks
 Contain directed cycles
 Take sequences as input, no fixed size
input and output vectors, e. g. natural
speech

Deep learning XII - RNN
 No fixed size of computations
 Much simpler than ConvNets
 Maintain inner state exhibiting dynamic
temporal behavior
 Optimized through backpropagation
 Can be extended with long time memory
extensions
 Don’t necessary need sequences of inputs

Deep learning XIII - RNN
 Training RNN is a non-linear global
optimization problem
 Trained using stochastic gradient descent
 Non-linear, differentiable activation
function, e. g. rectifier
 Trained through backpropagation through
time (BPTT)
 Genetic algorithms can be used for training

Deep learning XIV - RNN
 Many different architectures for RNN
Elman SRN Spiking neural network

Deep learning XV - RNN
RNN learns to read house
numbers
RNN learns to paint
house numbers
Karpathy 2015

Deep learning XVI - RNN
 RNN used for
 Transcribe speech to text
 Voice synthetization
 Machine translation

Deep learning XVII
 Combining ConvNets and RNN for
image descriptions
 Regions described
using language as
label space using
ConvNet
 Language synthesizing
using RNN
Karpathy & Fei-Fei 2014

Deep learning XVIII
 ConvNet and RNN can be combined
 Automated caption generation

Deep learning XIX
 Automatic feature extraction
 No closed vocabulary set
 Alignment of segments of sentences to
region on the image
Karpathy & Fei-Fei 2014

Deep learning XX
 Other applications
 Object recognition
 Movie classification
 Handwriting recognition
 Record linkage

Challenges I
 Main disadvantage large volumes of
training data needed
 Overfitting if not enough training data
 Optimization difficult
 Finding relevant information
 Privacy preservice data mining

Challenges II
 Describing actions

Discussion
 Future research in
 Attention based models
 Finding relevant information
 Data democratization and Internet of
Things
 Unsupervised learning
 Semantic data modeling
 Reasoning

Thank you for the attention
 Questions?

References
 Zhao, X, Li, X & Zhang, Z 2015, 'Multimedia Retrieval via Deep Learning to Rank ', IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1487 -
91 <http://ieeexplore.ieee.org.ezproxy.usq.edu.au/xpls/abs_all.jsp?arnumber=7054452>.
 Yu, W, Zhuang, F, He, Q & Shi, Z 2015, 'Learning deep representations via extreme learning machines', Neurocomputing, vol. 149, Part A,
pp. 308-15, <http://www.sciencedirect.com/science/article/pii/S0925231214011461>.
 Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhutdinov, R, Zemel, R & Bengio, Y 2015, 'Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention', Proceedings of the 32nd International Conference on Machine Learning from Data: Artificial Intelligence
and Statistics, vol. 37.
 Xin, J, Wang, Z, Qu, L & Wang, G 2015, 'Elastic extreme learning machine for big data classification', Neurocomputing, vol. 149, Part A, pp.
464-71, <http://www.sciencedirect.com/science/article/pii/S0925231214011503>.
 Weston, J, Chopra, S & Bordes, A 2015, 'Memory Networks', in 3rd International Conference on Learning Representations: proceedings of
the3rd International Conference on Learning Representations San Diego, viewed <http://arxiv.org/pdf/1410.3916v10.pdf>.
 Weilong, H, Xinbo, G, Dacheng, T & Xuelong, L 2015, 'Blind Image Quality Assessment via Deep Learning', Neural Networks and Learning
Systems, IEEE Transactions on, vol. 26, no. 6, pp. 1275-86.
 Wang, Y, Li, D, Du, Y & Pan, Z 2015, 'Anomaly detection in traffic using L1-norm minimization extreme learning machine', Neurocomputing,
vol. 149, Part A, pp. 415-25, <http://www.sciencedirect.com/science/article/pii/S0925231214011382>.
 Vinyals, O, Toshev, A, Bengio, S & Erhan, D 2015, 'Show and Tell: A Neural Image Caption Generator', Google,
<http://arxiv.org/pdf/1411.4555v1.pdf>.
 Noda, K, Yamaguchi, Y, Nakadai, K, Okuno, H & Ogata, T 2015, 'Audio-visual speech recognition using deep learning', Applied Intelligence,
vol. 42, no. 4, pp. 722-37, <http://dx.doi.org/10.1007/s10489-014-0629-7>.
 Mao, W, Zhao, S, Mu, X & Wang, H 2015, 'Multi-dimensional extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 160-70,
<http://www.sciencedirect.com/science/article/pii/S0925231214011540>.
 Liu, X, Wang, L, Huang, G-B, Zhang, J & Yin, J 2015, 'Multiple kernel extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 253-
64, <http://www.sciencedirect.com/science/article/pii/S0925231214011199>.
 LeCun, Y, Bengio, Y & Hinton, G 2015, 'Deep learning', Nature, vol. 521, no. 7553, pp. 436-44, <http://dx.doi.org/10.1038/nature14539>.
 Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R 2014, 'Dropout: a simple way to prevent neural networks from
overfitting', J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-58.
 Karpathy, A & Fei-Fei, L 2014, 'Deep visual-semantic alignments for generating image descriptions', arXiv preprint arXiv:1412.2306.

Multimedia data mining using deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Multimedia data mining using deep learning

Similar to Multimedia data mining using deep learning (20)

Recently uploaded

Recently uploaded (20)

Multimedia data mining using deep learning