Using Deep Learning for Recommendation

Using Deep Learning For
Recommendation
Scala Matsuri 2017
深層学習を利用したレコメンデーションシステムの作り方

About me
• Eduardo Gonzalez
• twitter: @wm_eddie
• github wmeddie
• Skymind Deep Learning Engineer
• Previously Japan Business Systems since 2009
• Japanese System Integrator (SIer)
• Social Systems Design Center (R&D)
• Focused on Machine Learning and NLP
スカイマインドのゴンザレズです。最近は社会システムデザインセンターという研究開発
部で自然言語処理の為によく機械学習を使っています。

Goal
今回紹介するサンプルの目的を紹介します

With shopping cart item recommendation
Convenient
ECサイトでよくある、次に買うアイテムのレコメンドはとても便利です

Without shopping cart item recommendation
このような機能がなかったら、セットで買い物することがとても不便になります
Unhelpful
when buying
sets of things

Feature Definition
• Given the current shopping cart, predict the next item the
user will buy
• This might not be items that are similar
• Use the top N suggestions on EC site
この機能の定義：現在のカートの状態から、次回買うであろう物上位N個をページに表示

How to do this
• Plenty of ML algorithms can be used for this problem
• FPGrowth
• PrefixSpan
• Alternating Least Squares (With counts for ratings)
• K-means clustering
• But can we use Deep Learning?
パターンマイニング、k平均法など従来の機械学習を用いた実装方法がいくつかあります。
深層学習を使えるでしょうか？

Should you choose deep learning?
• Keep an eye out for published papers that are highly cited
• Pay attention to non-deep learning baselines
• Your simplest algorithm may be behind you
• Invest time if payoff is significant enough
• For recommendation a few % can mean a lot of money
深層学習が話題ですが、単純な機械学習アルゴリズムで十分なことも多いです。もし数%
の差が利益に直結するなら深層学習が役立つでしょう。

When to choose deep learning
Popular
Conference
but check!
Industry name.
Sometimes very
important
著名な学会で、自分と同じような問題を解いて成果を上げている論文を探しましょう。自
分のデータにも適用できるかもしれません。
Look for papers using deep learning techniques for similar problems this is a good sign it’ll work with your data.
(but not a guarantee)

When to choose deep learning
Regular ML
techniques
inadequate
特に従来の方法より結果が良い時は深層学習を試す価値があります。
Significant
gain possible

Let’s Deep Learn this! How hard can it be?
この問題に深層学習を利用しよう！楽勝でしょう。

この言葉はどういう意味？分かりません。Adamって誰でしょう。。。
(Tensorflow)
What do all these words
mean?
truncated_normal?
softmax?
Who’s Adam?

Quick Intro to Neural Networks
Just enough to use DL4J
まずDL4Jを利用できる程度のニューラルネットワークの入門を紹介します

What is a Neural Network
An artificial neuron can be any plain old object that has a method forward(input:
Array[Double]): Double that multiplies the input parameter by an internal set of
parameters called weights, adds a bias and runs the result through a nonlinear
activation function (Like sigmoid or relu). You want the weights to be small
random numbers.
人工ニューロンは forward メソッドを実装した単なるオブジェクトです。forward は、入
力に重みを掛けてバイアスを足したものを活性化関数に適用したものです。
Σ 𝜎
input a
bias
✖️ Weight binput b
activation✖️ Weight a

Neural Net Training
• Using backpropagation you train the network by updating the weights using a
small learning rate so that they are only a little better next time. It’s best to
use an advanced updaters like RMSProp or Nesterov to learn faster
差分を計算し、それぞれのニューロンの重みを少し更新します。どれぐらい更新するべき
かをラーニングレートと呼ぶ。高度なアルゴリズムで訓練時間をより短縮。
Amazing Image by:
Alec Radford.

Neural Network Vocabulary
• LSTM: A Recurrent Neural Network that can learn patterns and time
series like stock data
• Batch Normalization: Re-centering the activations in the middle of the
network to prevent exploding gradients
• Softmax: A way to produce a probability distribution
上記の単語は利用します。LSTMとはパターンや時系列のデータが出力できるNN。バッ
チノームは中間の出力を正規化する仕組みです。ソフトマックスが確率分布を生成する。

Deep Learning Tools
Elegant weapons for our civilized age.
深層学習のツールの紹介

DL4J
• A Deep Learning framework written in Java
• Matrix backend written in C++ with both CPU and GPU support.
• Can be run inside Spark so you can keep your data where it is.
• Deploy models as-is into production app servers or android apps.
• Lower maintenance cost than Python
• Commercial backing
• High-Level API with tested implementations of effective layers, activation and loss
functions
• Japanese documentation
DL4JはJavaで書かれている深層学習フレームワークです。Scalaでも利用できるしSpark
も対応しています。Javaを利用している企業ならそのままの本番環境に活用

Other Great Alternatives
• Tensorflow
• Very low level API, but very flexible. Perfect for research!
• Training in Python. Inference with JavaCPP possible
• Very popular. Lots of samples.
• Keras
• High level API (Similar to DL4J) for both Tensorflow and Theano!
• DL4J supports importing keras models!
• MXNet
• Good alternative to Tensorflow with a Scala/Python/R API!
• Feels like using a wrapper library.
• Chainer
• Nicely object oriented framework with Japanese documentation
DL4J 以外にも優れたライブラリがあります。特に Keras は DL4J と似た API を持ち、
Keras で作ったモデルを DL4J にインポートすることもできます。

Tensorflow Code Sample
Tensorflowのサンプルコードです。
数学に詳しい人は一番自由度が高い。しかも人気なのでサンプルが多いです。

MXNet Code Sample
MXNet のサンプル (Scala) です。Tensorflow と似ていますが、多言語対応を狙って Map
ベースの API を提供しており IDE の恩恵を受けられません。

DL4J Code Sample
DJ4J のサンプルコードです。大半が文字列キーでなくメソッドなので IDE の恩恵が受け
られます。

Keras Code Sample
一番美しいKerasのサンプルコードです。

ScalNet* Code Sample
Kerasに似てる、DL4JベースのScalaAPIのサンプルコードです。
*コントリビューター募集中です。
*Contributors welcome

Data Set: Online Retail
• Sample dataset from UCI’s dataset repository
• Not perfect but has invoice data with sequences of products
• Lets try to make a recommender that suggests the most likely next
product instead of only similar products.
今回はUCIからOnlineRetailというデータセットを利用します。今回の目的に完璧なデー
タではないが、利用できると思います。企業向けのECサイトの請求データです

DataVec: DL4Js raw data to vector library
• This data is “categorical”. Need to change it into numbers.
• DataVec has everything we need to convert it into an easy-to-train
vector.
• InputSplit + RecordReader → DataSetIterator → next() → DataSet
• Multiple RecordReaders
• CSV/libsvm/matlab/json
• Has hooks for pre-processors and normalization
• MinMaxScaler
• Tokenizers
DL4JのエコシステムにDataVecというデータのETLライブラリも含まれています。
DataVecを利用すると、データを簡単に深層学習用に変換できます。

Step 0: Transform OnlineRetail to Sequences
元データの請求書を一つ一つのファイルに変換します。このフォーマットはHDFSの用な
分散ファイルシステムに有利です。短すぎたり長すぎる領収書は除外します。
Items.csv Countries.csv Input_N.csv Label_N.csv
Use only examples with
Enough data:
Items bought > 100 times
Sequence Length > 2
Sequence Length < 50

Step 1: Read data as vectors
今回はRNNは複数入力がある為RecordReaderMultiDataSetIteratorを設定します。
ニューラルネットワークのたった一行で、データを NN の OneHot 形式に変換できます。
…
Convert to 3d
Matrixes for RNNs
Convert to
OneHot with
one line
Alignment mode
automatically
masks data

Step 2: Build network
このようにニューラルネットワークの形を作成します。論文に書いてたGRUは現在DL4J
に含めていないのでLSTMを利用。
Paper used GRUs
but currently not
available in DL4J
After the NN Intro
hopefully more
words make
sense

Step 3: Train
Monitor performance
with IterationListeners
ネットワークの構成をComputationGraphに渡しinit実行すれば訓練が始まります。
訓練の状況を見るためにはIterationListenerを使います。

Step 4: Evaluate and Goto 2
一発勝負は絶対ないので少しずつ構成を変更し実験を繰り返します。

1 LSTM Layer + softmax
• Top 20 accuracy: 0.029!
• 1434 classified by model as
1502: 11 times
1502: 7 times
1502: 7 times
• T_T
一番最初の実験がTop 20の2%正しかったがしかし結果がいつも同じでした。T_T

Problem: Unbalanced dataset. Goto 1
• Neural networks will almost
always find the most popular
output, and always predict that.
• Use balanced data for every call
to fit()
• Especially deep networks and
LSTMs.
• Use Batch Normalization or Skip
connections to preserve
gradients.
ニューラルネットワークは一番人気の出力を見つけるのが非常に優秀です。Fitに渡す
データが出来るだけバランスする必要があります。またステップ１に…

1 LSTM Layer + Batch norm everything
• Learns a slightly better,
memorized 3 values this time
• Lets add a fully connected layer
so that the LSTM can make
better use of it’s inner context.
バッチ正規化というテクニックをしたらトップ3個の商品を暗記できました。少しでも進
化は進化です。

1 LSTM Layer + Batch norm + Dense Layer +
skip • Skip connections allow network
embedding layers to learn better
representations
• Notice how every addition
halves the speed the network
trains (185 → 100 → 55)
+
２層のLSTMとスキップ接続のテクニックを利用したら結果が大分よくなりました。ただ
し複雑になった分訓練スピードが4分の1に減速。

Learning rate too high
• If your learning rate it too high
or network too wide/deep you
might start seeing this kind of
graph. Showing that the
network is diverging.
ネットワーク構成が大きいすぎる時、学習レートが高すぎる時、このように逆効果になる
事があります。他のグラフを見ても安定していないことが分かります。

Best so far
• Used all the best tricks
• Very deep with two big LSTM layers
• Batch Norm
• Gradient Clipping
• Learning Rate Decay
• 0.045 Top 20 Accuracy
• To get better you might have to
implement custom loss
functions.
いくつか試した結果、ベストはこの結果でした。学習レートを変更できる機能を設定し、
4.5%のトップ20でした。

Save and use model with ModelSerializer
Save model after training using writeModel() Use model in app with restoreComputationGraph()
訓練したモデルを保存する事が出来ます。ファイルを読み込むと訓練したモデルを利用で
きます。スレッドセーフではないので本番ではEJBやAkka RSを利用する事が必要です。
Be careful with threading and RAM in multi-thread contexts!
Good use case for Java EE Stateless beans or Akka Reactive Streams!
In Production

Takeaways
• Deep Learning is hard and DL4J makes the best techniques just a
method call away and they are necessary for best results
• Making network topologies is easy but data is important
• Shuffle, balance and normalize/scale your data.
• One-hot your input or use embedding layers.
• Start simple, and tweak
• Save every model and save parameters and document results.
• Keep up with latest techniques
まとめ：DL4Jは簡単に最新のテクニックを利用できます。データの正規化やバランスは
非常に重要です。シンプルな構成から始めるのがベスト。

Thank You
Source: https://github.com/wmeddie/lstm-recommender
Get Started with Scala
sbt new wmeddie/dl4j-scala.g8
Ask questions on Gitter:
https://gitter.im/deeplearning4j/deeplearning4j
https://gitter.im/deeplearning4j/deeplearning4j/deeplearning4j-jp (Japanese)
ご清聴ありがとうございました。
質問はチャットルームへどうぞ

Using Deep Learning for Recommendation

More Related Content

What's hot

Viewers also liked

Similar to Using Deep Learning for Recommendation

Using Deep Learning for Recommendation

Editor's Notes