Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fernando Velasco @fer_maat
Raúl de la Fuente @neurozetta
Wild Data
Introduction
7
© Stratio 2018. Confidential, All Rights Reserved.
Confusion Matrix
8
© Stratio 2018. Confidential, All Rights Reserved.
Layers, layers, layers
9
CNN Stuff
© Stratio 2018. Confidential, All Rights Reserved.
Building the structures: how can we define a neuron?
11
© Stratio 2018. Confidential, All Rights Reserved.
BackPropagation Basics
12
Forward Propagation: get a result
Backward Pr...
Image Data
● Images are composed
by pixels.
● Grayscale images can
be seen as matrixes
● Coloured images are
usually repre...
© Stratio 2018. Confidential, All Rights Reserved.
Convolutions
14
Introducing
Keras
Convolutional
representations
Convolution Examples (I)
Edge
Detection
Edge
Enhance
(right)
Convolutions Examples (II)
Blur
Sharpen
Emboss
Relu activation
- Avoids vanishing gradient
- Efficient computation
- Sparsity
- Adaptability
Putting it all to together: Backprop to the rescue!
● Forward propagation is performed the usual way
● So is the loss (rem...
Classic Data
Augmentation
© Stratio 2018. Confidential, All Rights Reserved.
CNN
21
Are Convolutional Neural Networks
invariant to…
● Scale?
● Rotat...
© Stratio 2018. Confidential, All Rights Reserved.
CNN
22
Are Convolutional Neural Networks
invariant to…
● Scale? No
● Ro...
© Stratio 2018. Confidential, All Rights Reserved. 23
Be careful!!!
© Stratio 2018. Confidential, All Rights Reserved.
What will I need?
24
1. Data
2. Data
3. and more Data
© Stratio 2018. Confidential, All Rights Reserved.
Rank top 10 ways to data augmentation
25
1. You
2. Can
3. Not
4. Rank
5...
© Stratio 2018. Confidential, All Rights Reserved. 26
Change labeling
© Stratio 2018. Confidential, All Rights Reserved.
Weighting the loss
27
© Stratio 2018. Confidential, All Rights Reserved. 28
Ignore Sampling
© Stratio 2018. Confidential, All Rights Reserved. 29
Over- or undersample
© Stratio 2018. Confidential, All Rights Reserved.
Augmentation
30
“porg”
© Stratio 2018. Confidential, All Rights Reserved.
Augmentation
31
“porg”
© Stratio 2018. Confidential, All Rights Reserved.
Get creative!
32
Mix of :
● translation
● rotation
● stretching
● shear...
© Stratio 2018. Confidential, All Rights Reserved. 33
DATA
AUGMENTATIOOOON!!!
© Stratio 2018. Confidential, All Rights Reserved.
Test Time Augmentation
34
While augmentation helped give us a better mo...
© Stratio 2018. Confidential, All Rights Reserved.
• Simple to implement,
can be done on the fly!
• Especially useful for
...
Transfer Style
IDEA / SOLUTION
● Let’s extract content from original photo.
● Let’s extract style from reference photo.
● Now combine con...
RECREATION
© Stratio 2018. Confidential, All Rights Reserved.
I HAVE GOT AN IDEA!!!
43
I HAVE GOT AN IDEA!!!
I HAVE GOT AN IDEA!!!
Content Loss
- Layer complexity increases with the position
- The responses in a layer l can be stored in a matrix F(l;i, ...
Style Loss
- Which features in the style-layers activate simultaneously for the style-image? To
obtain a representation of...
Gradient of an image
- To transfer the style of an artwork a onto a photograph p we
jointly minimise the distance of the f...
Generative Models
Let’s make an experiment
Concept Image
Knowledge
Residual Ideas
© Stratio 2018. Confidential, All Rights Reserved.
Autoencoders (Idea)
51
Input hidden hidden hidden
Output
● Supervised n...
© Stratio 2018. Confidential, All Rights Reserved.
Autoencoders (Idea)
52
This is not the Generative
Model you are looking...
© Stratio 2018. Confidential, All Rights Reserved.
Autoencoders (Encoder and decoder)
53
This is not the Generative
Model ...
© Stratio 2018. Confidential, All Rights Reserved.
Autoencoders BackPropagation
54
This is not the Generative
Model you ar...
© Stratio 2018. Confidential, All Rights Reserved.
Generative Models (Idea)
55
Generative Models
“What I cannot create, I ...
© Stratio 2018. Confidential, All Rights Reserved.
Generative Models (Idea 2)
56
● They model how the data was generated i...
© Stratio 2018. Confidential, All Rights Reserved.
Generative Models Applications
57
● Generate potentially unfeasible exa...
© Stratio 2018. Confidential, All Rights Reserved.
Variational Autoencoder Idea
58
Input image Output image
Latent Space
M...
Keras
Introducing
Keras
Demogorgon smile
generation is beyond the
state of the art
© Stratio 2018. Confidential, All Rights Reserved.
Latent Space Distribution (I)
60
Latent Space
Mean Vector
Standard Devi...
© Stratio 2018. Confidential, All Rights Reserved.
Latent Space Distribution (II): VAE Loss function
61
Latent Space
Mean ...
Relu
Distribution
Divergence (K-L)
Reconstruction
Loss
© Stratio 2018. Confidential, All Rights Reserved.
Latent Space Distribution (III): Probability overview
63
Latent Space
M...
© Stratio 2016. Confidential, All Rights Reserved. 64
Generative
Adversarial
Networks
(GAN)
Generator
● The generator is trained to fool the
discriminator.
● It creates samples that are intended to
come from the sa...
Discriminator
● The discriminator examines samples to
determine whether they are real or fake.
● It learns using tradition...
Come together
Generate n
Fake
Images
Get n
training
examples
Train
Discriminator
Train
Generator
Repeat
● Generator must l...
Google
AutoAugment
So far, so good
● Network architectures can also be used to hardcode
invariances: convolutional networks bake in translati...
Reformulating the problem
● Finding the best augmentation policy
can be formulated as a discrete search
problem: two opera...
Algo highlights
● The search algorithm has two components: a controller (RNN),
feeded by the subsequent predictions, and t...
Results
Imagenet
Fine Grained Visual Classification Datasets
CIFAR 100CIFAR 10
Índice Analítico
Introducción: ¿por qué combinar modelos?
Boosting & Bagging basics
Demo:
○ Implementación de Adaboost con...
THANK YOU!
Fernando Velasco @fer_maat
Raúl de la Fuente @neurozetta
I.A.
© Stratio 2018. Confidential, All Rights Reserved.
BE AWARE!
78
© Stratio 2018. Confidential, All Rights Reserved.
Let me introduce you to my friend Cajal. He knew something about neuron...
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Upcoming SlideShare
Loading in …5
×

Wild Data - The Data Science Meetup

440 views

Published on

On July 19th, we got together at Google Campus to talk about how to increase and complete existing data to improve Machine Learning Models.

Fernando Velasco, Data Scientist at Stratio, and Raúl de la Fuente, Presales at Stratio, talked about techniques of image processing like Data Augmentation and other more modern techniques that involve the use of Deep Learning models.

More info: http://www.stratio.com/blog/events/planet-data-scientist-live-meet-the-wild-data/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Wild Data - The Data Science Meetup

  1. 1. Fernando Velasco @fer_maat Raúl de la Fuente @neurozetta Wild Data
  2. 2. Introduction
  3. 3. 7
  4. 4. © Stratio 2018. Confidential, All Rights Reserved. Confusion Matrix 8
  5. 5. © Stratio 2018. Confidential, All Rights Reserved. Layers, layers, layers 9
  6. 6. CNN Stuff
  7. 7. © Stratio 2018. Confidential, All Rights Reserved. Building the structures: how can we define a neuron? 11
  8. 8. © Stratio 2018. Confidential, All Rights Reserved. BackPropagation Basics 12 Forward Propagation: get a result Backward Propagation: who’s to blame? Input hidden hidden hidden Output Error Estimation: evaluate performances ● A cost function C is defined ● Every parameter has its impact on the cost given some training examples ● Impacts are computed in terms of derivations ● Use the chain rule to propagate error backwards
  9. 9. Image Data ● Images are composed by pixels. ● Grayscale images can be seen as matrixes ● Coloured images are usually represented as mixes of three colours: Red, Green and Blue ● Each one can be seen as a grayscale-like filter.
  10. 10. © Stratio 2018. Confidential, All Rights Reserved. Convolutions 14
  11. 11. Introducing Keras Convolutional representations
  12. 12. Convolution Examples (I) Edge Detection Edge Enhance (right)
  13. 13. Convolutions Examples (II) Blur Sharpen Emboss
  14. 14. Relu activation - Avoids vanishing gradient - Efficient computation - Sparsity - Adaptability
  15. 15. Putting it all to together: Backprop to the rescue! ● Forward propagation is performed the usual way ● So is the loss (remember in most cases we are performing a classification) Lenet-like network ● Backprop allows the filter parameter computation (Conv layers) ● Pooling is (mostly) not affected by backprop
  16. 16. Classic Data Augmentation
  17. 17. © Stratio 2018. Confidential, All Rights Reserved. CNN 21 Are Convolutional Neural Networks invariant to… ● Scale? ● Rotation? ● Translation?
  18. 18. © Stratio 2018. Confidential, All Rights Reserved. CNN 22 Are Convolutional Neural Networks invariant to… ● Scale? No ● Rotation? No ● Translation? Partially Are Convolutional Neural Networks invariant to… ● Scale? ● Rotation? ● Translation?
  19. 19. © Stratio 2018. Confidential, All Rights Reserved. 23 Be careful!!!
  20. 20. © Stratio 2018. Confidential, All Rights Reserved. What will I need? 24 1. Data 2. Data 3. and more Data
  21. 21. © Stratio 2018. Confidential, All Rights Reserved. Rank top 10 ways to data augmentation 25 1. You 2. Can 3. Not 4. Rank 5. Them 6. Without 7. Knowing 8. The 9. Data 10.Distribution
  22. 22. © Stratio 2018. Confidential, All Rights Reserved. 26 Change labeling
  23. 23. © Stratio 2018. Confidential, All Rights Reserved. Weighting the loss 27
  24. 24. © Stratio 2018. Confidential, All Rights Reserved. 28 Ignore Sampling
  25. 25. © Stratio 2018. Confidential, All Rights Reserved. 29 Over- or undersample
  26. 26. © Stratio 2018. Confidential, All Rights Reserved. Augmentation 30 “porg”
  27. 27. © Stratio 2018. Confidential, All Rights Reserved. Augmentation 31 “porg”
  28. 28. © Stratio 2018. Confidential, All Rights Reserved. Get creative! 32 Mix of : ● translation ● rotation ● stretching ● shearing ● random erasing ● adding noise ● lend distorsion, … (go crazy)
  29. 29. © Stratio 2018. Confidential, All Rights Reserved. 33 DATA AUGMENTATIOOOON!!!
  30. 30. © Stratio 2018. Confidential, All Rights Reserved. Test Time Augmentation 34 While augmentation helped give us a better model... prediction accuracy can be further improved by TTA
  31. 31. © Stratio 2018. Confidential, All Rights Reserved. • Simple to implement, can be done on the fly! • Especially useful for small datasets Data Augmentation: Takeaway 35
  32. 32. Transfer Style
  33. 33. IDEA / SOLUTION ● Let’s extract content from original photo. ● Let’s extract style from reference photo. ● Now combine content and style together to get a new “styled” result.
  34. 34. RECREATION
  35. 35. © Stratio 2018. Confidential, All Rights Reserved. I HAVE GOT AN IDEA!!! 43
  36. 36. I HAVE GOT AN IDEA!!!
  37. 37. I HAVE GOT AN IDEA!!!
  38. 38. Content Loss - Layer complexity increases with the position - The responses in a layer l can be stored in a matrix F(l;i, j), where l; ij is the activation of the i th filter at position j in layer l. - Let p and x be the original image and the generated one, and P(l) and F(l) their feature representations in layer l. We define the squared-error loss between the representations: - When this content-loss is minimized, it means that the mixed- image has feature activation in the given layers that are very similar to the activation of the content-image - Input image is transformed into representations increasingly sensitive to the content of the image, but relatively invariant to its precise appearance. - Higher layers capture the high-level content in terms of objects and their arrangement in the input image but do not constrain the exact pixel values of the reconstruction very much
  39. 39. Style Loss - Which features in the style-layers activate simultaneously for the style-image? To obtain a representation of the style, we use the correlations between the different filter responses. These feature correlations are given by the Gram matrix. - If an entry in the Gram-matrix has a value close to zero then it means the two features in the given layer do not activate simultaneously for the given style-image and vice versa. - We can construct an image that matches the style representation of a given input image by minimising the distance between the Gram matrices from the original image and the generated one (A, G): - And we can define a style loss, weighted depending on the layers to boost:
  40. 40. Gradient of an image - To transfer the style of an artwork a onto a photograph p we jointly minimise the distance of the feature representations of a white noise image from the content representation of the photograph in one layer and the style representation of the painting (where α and β are the weighting factors for content and style reconstruction, respectively) - Please note both the style and content loss functions are differentiable wrt the activations F(l; i, j). We then can differentiate the loss function with respect to the pixel values x in order to obtain a gradient that can be used as input for some numerical optimisation strategy.
  41. 41. Generative Models
  42. 42. Let’s make an experiment Concept Image Knowledge Residual Ideas
  43. 43. © Stratio 2018. Confidential, All Rights Reserved. Autoencoders (Idea) 51 Input hidden hidden hidden Output ● Supervised neural networks try to predict labels from input data ● It is not always possible to obtain labels ● Unsupervised learning can help obtain data structure. ● What if we turn the output to be the input?
  44. 44. © Stratio 2018. Confidential, All Rights Reserved. Autoencoders (Idea) 52 This is not the Generative Model you are looking for Input image Output image ● It tries to predict x from x, but no labels are needed. ● The idea is learning an approximation of the identity function. ● Along the way, some restrictions are placed: typically the hidden layers compress the data. ● The original input is represented at the output, even if it comes from noisy or corrupted data.
  45. 45. © Stratio 2018. Confidential, All Rights Reserved. Autoencoders (Encoder and decoder) 53 This is not the Generative Model you are looking for Input image Output image ● The latent space is commonly a narrow hidden layer between encoder and decoder ● It learns the data structure ● Encoder and decoder can share the same (inversed) structure or be different. ● Each one can have its own depth (number of layers) and complexity. Encode Decode Latent Space
  46. 46. © Stratio 2018. Confidential, All Rights Reserved. Autoencoders BackPropagation 54 This is not the Generative Model you are looking for Input image Output image ● A cost function can be defined taking into account differences between input and Decoded(Encoded(Input)) ● This allows BackProp to be carried along Encoder and Decoder ● To prevent function composition to be the Identity, some regularizations can be taken ● One of the most common is just reducing the latent space dimension (i.e: compressing the data on the encoding) Encode Decode Latent Space BackPropagation
  47. 47. © Stratio 2018. Confidential, All Rights Reserved. Generative Models (Idea) 55 Generative Models “What I cannot create, I do not understand.” —Richard Feynman
  48. 48. © Stratio 2018. Confidential, All Rights Reserved. Generative Models (Idea 2) 56 ● They model how the data was generated in order to categorize a signal. ● Instead of modeling P(y|x) as the usual discriminative models, the distribution under the hood is P(x, y) ● The number of parameters is significantly smaller than the amount of data on which they are trained. ● This forces the models to discover the data essence ● What the model does is understanding the world around the data, and provide good data representations of it
  49. 49. © Stratio 2018. Confidential, All Rights Reserved. Generative Models Applications 57 ● Generate potentially unfeasible examples for Reinforcement Learning ● Denoising/Pretraining ● Structured prediction exploration in RL ● Entirely plausible generation of images to depict image/video ● Feature understanding
  50. 50. © Stratio 2018. Confidential, All Rights Reserved. Variational Autoencoder Idea 58 Input image Output image Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network Sample on Latent Space => Generate new representations Prior distribution
  51. 51. Keras Introducing Keras Demogorgon smile generation is beyond the state of the art
  52. 52. © Stratio 2018. Confidential, All Rights Reserved. Latent Space Distribution (I) 60 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network
  53. 53. © Stratio 2018. Confidential, All Rights Reserved. Latent Space Distribution (II): VAE Loss function 61 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network ● Encoder and decoder can be denoted as conditional probability representations of data: ● Typically the encoder reduces dimensions as decoder increases it . So, when reconstructing the inputs some information is lost. This information loss can be measured using the reconstruction log-likelihood: ● In order to keep the latent image distribution under control, we can introduce a regularizer into the loss function. The Kullback-Leibler divergence between the encoder distribution and a given and known distribution, such as the standard Gaussian: ● With this penalty in the loss encoder, outputs are forced to be sufficiently diverse: similar inputs will be kept close (smoothly) together in the latent space.
  54. 54. Relu Distribution Divergence (K-L) Reconstruction Loss
  55. 55. © Stratio 2018. Confidential, All Rights Reserved. Latent Space Distribution (III): Probability overview 63 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network● The VAE contains a specific probability model of data x and latent variables z. ● We can write the joint probability of the model as p(x,z): “how likely is observation x under the joint distribution”. ● By definition, p(x, z)=p(x∣z)p(z) ● In order to generate the data, the process is as follows: For each datapoint i: - Draw latent variables zi∼p(z) - Draw datapoint xi∼p(x∣z) ● We need to figure out p(z) and p(x|z) ● The likelihood is the representation to be learnt from the decoder ● Encoder likelihood can be used to estimate parameters from the prior.
  56. 56. © Stratio 2016. Confidential, All Rights Reserved. 64 Generative Adversarial Networks (GAN)
  57. 57. Generator ● The generator is trained to fool the discriminator. ● It creates samples that are intended to come from the same distribution as the training data. ● We define the generator as a function G that takes z as input and uses as parameters. This is simply a differentiable function G. When z is sampled from some simple prior distribution, G(z) yields a sample of x drawn from p model ● The generator wishes to minimize a and must do so while controlling only .
  58. 58. Discriminator ● The discriminator examines samples to determine whether they are real or fake. ● It learns using traditional supervised learning techniques, dividing inputs into two classes (real or fake). ● The discriminator is a function D that takes x as input and uses as parameters. ● The discriminator wishes to minimize a and must do so while controlling only .
  59. 59. Come together Generate n Fake Images Get n training examples Train Discriminator Train Generator Repeat ● Generator must learn to cheat the discriminator learning to create samples from the same distribution as the training data. ● Players are represented by two functions, differentiable both with respect to its inputs and parameters. ● The training process consists of simultaneous SGD. On each step, two minibatches are sampled: a minibatch of x values from the dataset and a minibatch of z values from the model’s prior over latent variables. Then both cost functions are updated simultaneously. ● Each player’s cost depends on the other player’s parameters, but each player cannot control the other player’s parameters. This is a game, not an optimization!
  60. 60. Google AutoAugment
  61. 61. So far, so good ● Network architectures can also be used to hardcode invariances: convolutional networks bake in translation invariance, whereas physics models bake in invariance to translations, rotations, and permutations of atoms. ● Elastic distortions, scale, translation, and rotation during training is an effective data augmentation method on MNIST, due to the different symmetries present in these datasets. ● On natural image datasets, such as CIFAR-10 and ImageNet, random cropping, image mirroring and color shifting / whitening are more common. ● Common data augmentation methods for image recognition have been designed manually and the best augmentation policies are dataset-specific.
  62. 62. Reformulating the problem ● Finding the best augmentation policy can be formulated as a discrete search problem: two operations to be applied in sequence, with two hyperparameters ● 1) the probability of applying the operation ● 2) the magnitude of the operation. ● The policy has 5 sub-policies with 16 operations. Magnitudes and probabilities are discretized to 10 (resp 11) values ● For every image in a mini-batch, a sub- policy uniformly is chosen at random to train the neural network. ● Stochasticity ● The search space with 5 sub-policies then has roughly (16 × 10 × 11)**10 ≈ 2.9 × 10**32 possibilities.
  63. 63. Algo highlights ● The search algorithm has two components: a controller (RNN), feeded by the subsequent predictions, and the training algorithm, which is the Proximal Policy Optimization algorithm (RL) ● In total the controller has 30 softmax predictions in order to predict 5 sub-policies (2 operations, magnitude and probability) ● The controller is trained with a reward signal, which is how good the policy is in improving the generalization of a "child model" (a neural network trained as part of the search process) trained with augmented data generated by applying the 5 sub-policies on the training set ● For each example in the mini-batch, one of the 5 sub-policies is chosen randomly to augment the image. ● On each dataset, the controller samples about 15,000 policies. ● At the end of the search, we concatenate the sub-policies from the best 5 policies into a single policy (with 25 sub-policies), which will train the models for each dataset.
  64. 64. Results Imagenet Fine Grained Visual Classification Datasets CIFAR 100CIFAR 10
  65. 65. Índice Analítico Introducción: ¿por qué combinar modelos? Boosting & Bagging basics Demo: ○ Implementación de Adaboost con árboles binarios ○ Feature Selection con Random Forest 1 2 3 Not all that wander are lost Any Questions? Fernando Velasco @fer_maat Raúl de la Fuente @neurozetta
  66. 66. THANK YOU! Fernando Velasco @fer_maat Raúl de la Fuente @neurozetta
  67. 67. I.A.
  68. 68. © Stratio 2018. Confidential, All Rights Reserved. BE AWARE! 78
  69. 69. © Stratio 2018. Confidential, All Rights Reserved. Let me introduce you to my friend Cajal. He knew something about neurons 79 dendrite axon synapses: impulse transmission

×