Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning image classification aplicado al mundo de la moda

811 views

Published on

slides de la charla impartida en Codemotion 2016
http://2016.codemotion.es/agenda.html#5732408326356992/86464003

Published in: Software
  • Be the first to comment

Deep learning image classification aplicado al mundo de la moda

  1. 1. Robert Figiel Co-Founder & CTO Deep Learning Image Classification Deep Learning Image Classification Javier Abadía Lead Developer
  2. 2. WHAT DO WE DO AT STYLESAGE? Web-Crawling of 100M+ e-commerce products daily. Analysis of text, machine learning, image-recognition Visualize insights for fashion brands & retailers Collect Data Analyze Products Visualize Insights
  3. 3. CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES • Category: Dress
  4. 4. SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN) Input (Image Data) BLACK BOX (for now) Convolutional Neural Network Output (Probability Vector) • Dress : 94.8% • Skirt: 4.1% • Jacket: 1.2% • Pant: 0.1% • Socks: 0.01% • ...
  5. 5. TRADITIONAL COMPUTING algorithminput output
  6. 6. MACHINE LEARNING model training input output algorithm new input new output
  7. 7. MACHINE LEARNING - CLASSIFICATION Features Classes Supervised Learning
  8. 8. MACHINE LEARNING - CLASSIFICATION • Supervised Learning – Decision Trees – Bayesian Algorithms – Regression – Clustering – Neural Networks – … http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
  9. 9. LETTER RECOGNITION 28x28 – gray levels 784
  10. 10. LOGISTIC CLASSIFIER WX+b=Y 784 x 35 + =784 35 35 weights input features bias scores 35 probabilities P = softmax(Y)
  11. 11. GRADIENT DESCENT
  12. 12. CODE USING python/scikit-learn """ based on http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html """ import numpy as np from sklearn import linear_model, metrics N = 50000 X = np.array([x.flatten() for x in data['train_dataset'][:N]]) Y = data['train_labels'][:N] solver = 'sag' C = 0.001 # train logreg = linear_model.LogisticRegression(C=C, solver=solver) logreg.fit(X, Y) # test VX = np.array([x.flatten() for x in data['test_dataset']]) predicted_labels = logreg.predict(VX) print "%.3f" % (metrics.accuracy_score(predicted_labels, data['test_labels']),)
  13. 13. CODE WITH tensorflow import tensorflow as tf graph = tf.Graph() with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) # Variables. weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_labels])) biases = tf.Variable(tf.zeros([num_labels])) # Training computation. logits = tf.matmul(tf_train_dataset, weights) + biases loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) # Optimizer. optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  14. 14. LINEAR METHODS ARE LIMITED TO LINEAR RELATIONSHIPS PX ✕W1 +b1 Y s(Y) PX ✕W1 +b1 Y s(Y)✕W2 +b2 activation function (RELU)
  15. 15. CODE WITH tensorflow import tensorflow as tf graph = tf.Graph() with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) # Variables weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_moves])) biases1 = tf.Variable(tf.zeros([n_hidden_moves])) weights2 = tf.Variable(tf.truncated_normal([n_hidden_moves, num_labels])) biases2 = tf.Variable(tf.zeros([num_labels])) # Training model logits1 = tf.matmul(tf_train_dataset, weights1) + biases1 relu_output = tf.nn.relu(logits1) logits2 = tf.matmul(relu_output, weights2) + biases2 loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits2, tf_train_labels)) # Optimizer optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  16. 16. NN VS CNN Neural Network (ANY numeric input) Convolutional Neural Network (IMAGE input)
  17. 17. DEEPLEARNING MANY LAYERS FOR HIGHER ACCURACY Example: GoogLeNet architecture (2014), 22 layers Example:AlexNet (2012) 8 layers
  18. 18. ME ABURRO
  19. 19. CHOOSING A MODEL – OPEN SOURCE OPTIONS • AlexNet (2012) • 8 layers, 16.4% error rate on ImageNet • GoogLeNet (2014) • 22 layers, 6.66% error rate on ImageNet • Google Inception v3 (2015) • 48 layers, 3.46% error rate • Microsoft ResNet (2015) • 152 layers, 3.57% error rate Yearly competition on ImageNet dataset with 1M images across 1000 object classes – models available open source Many models open source. No need to re-invent the wheel.
  20. 20. FRAMEWORK – OPEN SOURCE OPTIONS • Caffe • Developed by UC Berkley, Very efficient algorithms • Implemented GoogLeNet, ResNet • Large community • Tensorflow • Released 2015 by Google • Ready-to-use Implementions of GoogLeNet, Inception v3 • Tensorboard for visualizing training progress • Torch, Theano, Keras, ... Many Python frameworks available, all with many examples, good documentation and pre-implemented models Chose a Python Frame- work that fits your needs
  21. 21. IMPLEMENTING A CNN MODEL – TRAIN - PREDICT Select / Develop MODEL TRAIN/TEST model with known images PREDICT on new Images Feedback loop Additional Training Data
  22. 22. INFRASTRUCTURE – GPUS Underlying CNN computations are mainly matrix multiplications  GPUs (Graphical Processing Unit) 30-50X faster than CPUs 1 CPU: 2 sec 1 GPU: 50ms 30-50X faster vs. Use GPU based servers for faster training and predictions
  23. 23. THANK YOU – WE ARE RECRUITING! Team Slide - recruiting www.stylesage.co/careers javier@stylesage.co
  24. 24. GRACIAS!

×