Azure Machine Learning services 2019年6月版

Azure Machine Learning
service

推論
デプロイメントデータの準備モデル構築・学習
世界中の研究者が
論文として公表。
多くの実証コードも
公開される。
最新の技術を利活用
んなデータを整備するか?
競争力のための
自社にしかないデータが
活用できるか?
ビジネスフロー全体の中の
どこでモデルを
利用すべきか?

共有
変革の速度
競争領域

1. 課題の特定
2. データの取得と加工
3. モデルの設計
4. モデルの作
成
5. モデルの
テストと評価 a. 初期化
b. データセットからミニバッチ
データ取得
c. 損失(差分)を計算d. 最適化: 損失(差分)の最小化
e. 重みづけの更新
y =Wx + b
loss = |desired – actual outcome|δ
6. 展開と推論
a. ログ収取

精度
プログラミ
ング
機械学習
アプローチ演繹的帰納的。つまりブラックボックスは
残る
機能保証 (≒ 精度):
Function Test
可能訓練データ次第。ただ、統計の域を
出ない
性能保証:
Performance Test
可能可能
妥当性確認試験:
Validation Test
可能やってみないと、わからない
https://www.slideshare.net/hironojumpei/ai-129527593

精度
時間
▲
再学習
▲
再学習
【連続的変化】
 未学習データパターンの混入
【断続的変化】
 データの質的変化
（取得環境/内容など）
 学習データ量の増大
 学習データ質の増大

Azure Cloud Services
Compute (Container) / Storage
Python SDK
データの加工
モデルの学習
モデルの管理
モデルの展開

…
Prepare Experiment Deploy
Orchestrate

Automated
Machine Learning UI
Visual Interface Machine Learning Notebooks

https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-track-experiments
メトリック、データ、モデル等
の
大事な資産の共有と運用管理
Experiment
実験
メトリックデータセッ
ト
モデル
Workspace
バージョン管理
オンプレミスへのデ
プロイ
パラメータ値
モデル精度の可視化
データ定義の管理
スナップショット

My Computer
Azure Notebooks
Services: Python SDK
Workspace
AmlCompute
(GPU)
学習用スクリプト

• 様々なスペックのVMを選択・起
動
• 自動スケールアウト・
ダウン
• ジョブ管理、スケジュール
管理学習コード
train train train
ジョブ・スケジュール管理
• 自動でライブラリ・データ
を準備
・・
・
Machine Learning Compute
• 低優先度オプション : 80% 割引で利用
可能

Purpose
VM Family
GPU
Mem / GPU
Sizes
Interconnect
2nd Network
VM CPU
VM RAM
Local SSD
Storage
Compute + ML Compute + ML Compute + ML
NC v1 NC v2 NC v3
NVIDIA K80 NVIDIA P100 NVIDIA V100
12 GB 16 GB 16 GB
1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU
PCIe (dual root) PCIe (dual root) PCIe (dual root)
FDR InfiniBand FDR InfiniBand FDR InfiniBand
Haswell Broadwell Broadwell
56-224 GB 112-448 GB 112-448 GB
~380-1500 GB ~700-3000 GB ~700-3000 GB
Std storage Prem storage Prem storage
ML
ND v1
NVIDIA P40
24 GB
1, 2 or 4 GPU
PCIe (dual root)
FDR InfiniBand
Broadwell
112-448 GB
~700-3000 GB
Prem storage
ML
ND v2 *
NVIDIA V100
16 GB
8 GPU
NVLink
-
Skylake
672 GB
~1300 GB
Prem storage
http://aka.ms/AzureNDv2GPUs

NVIDIA RAPIDS on AZUREML
パイプライン全体への手間のかからない統合
最小限のコード変更
モデルの精度向上をしやすく
より速く繰り返し、学習・展開可能に
学習時間を減らす
生産性の向上
Open Source
カスタマイズ、拡張、互換性 – Apache Arrow をベースに開発。NVIDIA の
サポート
https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/RAPIDS
https://blogs.technet.microsoft.com/mssvrpmj/2019/03/25/azure-machine-learning-
service-now-supports-nvidia-s-rapids/
XGBoost による勾配ブースティングを利用したデシジョンツリーのトレー
ニングを
CPU で実行した場合と GPU (Azure NC24s_v3) を使用した場合を比較

import
'--data-folder' type str
'data_folder' 'data folder
mounting point’
'train-images.gz' False 255.0
'test-images.gz' False 255.0
'train-labels.gz' True 1
'test-labels.gz' True 1
#1. Dataset
'./data/mnist'
'mnist' True
TrueData Store

import
'--batch-size' type int
'batch_size' 'mini
batch size for training'
'--epoch' type int
'epoch' 'epoch size
for training’
from import
'--data-folder' 'mnist'
'--batch-size' 50
'--epoch' 20
'--first-layer-neurons' 300
'--second-layer-neurons' 100
'--learning-rate' 0.001
'--activation'
'--optimizer'
'--loss'
'--dropout' 0.2
'--gpu'
'keras' 'matplotlib'
‘train.py'
True
1800
#2. Script Folder
'./keras-mnist'
True
import
‘./train.py'
'./utils.py'
Docker Image
Data Store

from import
# start an Azure ML run
class LogRunMetrics
# callback at the end of every epoch
def on_epoch_end
# log a value repeated which creates a list
'Loss' 'loss'
'Accuracy' 'acc'
2
Experiment

この車の妥当な価格は?

Mileage
Condition
Car brand
Year of make
Regulations
…
Parameter 1
Parameter 2
Parameter 3
Parameter 4
…
Gradient Boosted
Nearest Neighbors
SVM
Bayesian Regression
LGBM
…
Mileage Gradient Boosted Criterion
Loss
Min Samples Split
Min Samples Leaf
Others Model
Which algorithm? Which parameters?Which features?
Car brand
Year of make
試行錯誤

Criterion
Loss
Min Samples Split
Min Samples Leaf
Others
N Neighbors
Weights
Metric
P
Others
Mileage
Condition
Car brand
Year of make
Regulations
…
Gradient Boosted
Nearest Neighbors
SVM
Bayesian Regression
LGBM
…
Nearest Neighbors
Model
繰り返し
Gradient BoostedMileage
Car brand
Year of make
Car brand
Year of make
Condition

Mileage
Condition
Car brand
Year of make
Regulations
…
Gradient Boosted
Nearest Neighbors
SVM
Bayesian Regression
LGBM
…
Gradient Boosted
SVM
Bayesian Regression
LGBM
Nearest Neighbors
繰り返し
Regulations
Condition
Mileage
Car brand
Year of make

データセット
目標設定
学習の一貫性
出力入力
学習を並列処理
Compute リソース管理
ベストなモデルの選択
Optimized model

Data
Preprocessing
Feature
Selection
Algorithm
Selection
Hyperparameter
Tuning
Model
Recommendation
Interpretability
& Explaining
データの
クリーニン
グ
Feature の選択ジョブの並列
実行と合わせ
て
設定範囲の中で、
何を選択して
何を選択肢から
除外するか
精度と
実行速度も
加味
そのモデルに影
響のあった
Feature は
どれだったのか?

前処理手順説明
高カーディナリティまたは差異な
しの
特徴の削除
これらをトレーニングセットおよび検証セットから削除します。まったく値が存在しない特徴、すべての行の
値が同じである特徴、
非常に高いカーディナリティ (ハッシュ、ID、GUID など) の特徴が含まれます。
欠損値の補完数値特徴の場合、その列の平均値で補完します。
カテゴリ特徴の場合、出現回数が最も多い値で補完します。
その他の特徴の生成 DateTime の特徴:年、月、日、曜日、年の通算日、四半期、年の通算週、時間、分、秒。
テキストの特徴:ユニグラム、バイグラム、文字トライグラムに基づく期間の頻度。
変換とエンコード一意の値がほとんどない数値特徴は、カテゴリ特徴に変換されます。
カーディナリティの低いカテゴリ型の場合、ワンホットエンコードが実行されます。カーディナリティが高い
場合は、
ワンホットハッシュエンコードです。
ワードの埋め込み事前トレーニングされたモデルを使用してテキストトークンのベクトルをセンテンスベクトルに変換するテキ
スト特性化機能です。
ドキュメント内の各ワードの埋め込みベクトルは、ドキュメント特徴ベクトルを生成するためにまとめて集約
されます。
ターゲットエンコードカテゴリ特徴の場合、回帰の問題について各カテゴリを平均ターゲット値にマップします。分類の問題につい
ては、各クラスのクラス確率にマップします。マッピングの過剰適合および疎データカテゴリによって発生す
るノイズを削減するために、
頻度ベースの重み付けと k フォールドクロス検証が適用されます。
テキストターゲットエンコードテキスト入力の場合、bag-of-words を使用するスタック線形モデルは、各クラスの確率を生成するために使用
されます。
証拠の重み (WoE) ターゲット列に対するカテゴリ列の相関関係のメジャーとして、WoE を計算します。それは、クラス内および
クラス外の確率に対する比率の対数として計算されます。このステップでは、クラスごとに 1 つの数値特徴列
を出力し、明示的に欠損値と外れ値の処理を補完する必要がなくなります。

分類回帰時系列予測
ロジスティック回帰 Elastic Net Elastic Net
Light GBM Light GBM Light GBM
勾配ブースティング勾配ブースティング勾配ブースティング
デシジョンツリーデシジョンツリーデシジョンツリー
K ニアレストネイバー K ニアレストネイバー K ニアレストネイバー
Linear SVC LARS Lasso LARS Lasso
C のサポートベクター分類 (SVC) 確率的勾配降下法 (SGD) 確率的勾配降下法 (SGD)
ランダムフォレストランダムフォレストランダムフォレスト
Extremely Randomized Trees Extremely Randomized Trees Extremely Randomized Trees
Xgboost Xgboost Xgboost
DNN 分類子 DNN リグレッサー DNN リグレッサー
DNN 線形分類子線形リグレッサー線形リグレッサー
単純ベイズ
確率的勾配降下法 (SGD)

分散環境で並列実行することで高速化を実現

Job #1
Job #2
Job #𝑝
𝐶1
𝐶2
𝐶 𝑝
𝐶𝑖
𝐶𝑗
𝐶 𝑘
𝑪 𝟏= {learning rate=0.02,
#layers=3, …}
(B) 稼働中のジョブ管理
• 実行時間
• 使用 Computer Resources?
???
(A) 新規ジョブ作成
• 値の選択
• Random, adaptive, etc.
HyperDrive
…
Recommended
configurations, accuracy

データ探
索
変数の重要度
各予測値に対する説明サマ
リー
要因探索、与信管理などの業務で
はブラックボックスなモデルは使
えない...
https://docs.microsoft.com/en-
US/azure/machine-learning/service/machine-
learning-interpretability-explainability
Model interpretability with
Azure Machine Learning service

http://papers.nips.cc/paper/7595-probabilistic-matrix-
factorization-for-automated-machine-learning

2つの選択肢
Web service IoT Module

Azure
Container Instance
(ACI)
Azure
Kubernetes Service
(AKS)
テスト用途
高速スタートアップ
本番用途
自動スケールアウト
エッジデバイスで動
作
Azure IoT Hub 連携
Azure IoT Edge

Azure IoT Hub
..
PCbased
Software
FPGA
Azure Sphere
IoT
software
MCU
Azure IoT Edge
Azure Data Box Edge
Azure IoT Hub
Azure IoT Device SDK
Azure Sphere
AIライフサイクルをサポートする基盤
デバイスとのデータ入出力
クラウドサービスをエッジデバイスで運用
リアルタイム推論を実現するFPGA提供
複数デバイス、多言語、複数 OS
iOS、Android、Windows、Linux
MCU デバイス向けのセキュリティ
Azure または Azure IoT Edge で接続
データ送信
AIモデル
配置
AIモデル
配置

and Project Brainwave
End-to-end の Data Science Platform
Cloud と Edge を使った、データの準備、モデルの学習
と推論
Model management, telemetry, A/B testing, etc.
エンタープライズグレード: セキュリティとコンプラ
イアンス
Code in Python
Python と TensorFlow でのモデル作成と FPGAへのデプ
ロイ
Serverless Architecture
gRPC API の inferencing

CPU vs. GPU(V100) vs. FPGA(Brainwave)

Model
Management
Service
Azure ML orchestratorPython and TensorFlow
Featurize images and train classifier
Classifier
(TF/LGBM)
Preprocessing
(TensorFlow, C++
API)
Control Plane
Service
Brain Wave Runtime
FPGA
CPU

Frameworks Azure
Create Deploy
Services
Devices
Azure Machine Learning services
Ubuntu VM
Windows Server 2019 VM
Azure Custom Vision Service
ONNX Model
Windows devices
Other devices (iOS, etc.)

Extensible
Extensible architecture to
plug-in optimizers and
hardware accelerators
Flexible
Supports full ONNX-ML
spec (v1.2-1.5)
C#, C, and Python APIs
Cross Platform
Works on
-Mac, Windows, Linux
-x86, x64, ARM
Also built-in to Windows
10 natively (WinML)
github.com/microsoft/onnxruntime

Application #1 Application #2
WinML RT API
WinML Win32 API
WinML Runtime
Model Inference Engine
DirectML API
CPUDirect3D
GPU
Input
Surface
Output
Surface

Train Model Validate Model Deploy ModelPackage Model Monitor Model
Retrain Model

入力データ予測値運用中モデル
入力データ & 予測
値
テレメトリー
データの取得

入力データの特徴が
変化した
精度が落ちてきた
現行のモデル
開発時の分布最新のデータ分
布

Model reproducibility Model retrainingModel deploymentModel validation
Train model Validate
model
Deploy
model
Monitor
model
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Retrain model
Azure Machine Learning extension
for Azure DevOps
Data
(Model)
Code
自動化パイプラインによって運用管理を効率的
に！

Orchestration Services
Monitoring
Real-Time
Azure Kubernetes Service
ML Data Drift
Experimentation Monitoring
Batch
Azure ML Compute
Inference Monitoring
Compute
Azure DevOps ML
Extension
Storage
Model Packaging
Model Validation
Run History
Model Deployment
Asset Management
Environments
Code
Datasets
ML Audit Trail
Training Services
Edge
Azure IoT Hub

Azure
Machine Learning
Cognitive Service Cognitive Service
Customize

Step-by-Step Learning Achievements スムーズな学習環境
 無料
 日本語対応
 ブラウザーのみ。ハンズオ
ン環境も含めて
 ダウンロード可能なサンプ
ルコード
 Product/Service, 技術レベル,
job role, などに応じたガイダ
ンス
 Videos, チュートリアル, ハン
ズオン
 スキルアップを促す
 ユーザープロファイ
ル毎に
カスタマイズ
www.microsoft.com/learn

Azure Machine Learning services 2019年6月版

More Related Content

What's hot

Similar to Azure Machine Learning services 2019年6月版

More from Daiyu Hatakeyama

Azure Machine Learning services 2019年6月版