SlideShare a Scribd company logo
1 of 57
Download to read offline
Azure Machine Learning
– 其他篇
台灣微軟
技術傳教士
吳宏彬
8/25/2016
什麼是R語言
Open Source
“lingua franca”
Analytics, Computing,
Modeling
Global Community
Millions of users 7000+ Algorithms, Test
Data & Evaluations
Can be Scaled to
Big Data,
Big Analytics
Ecosystem
Scalability
Polls of data miners and analytics professionals on their software
choices since 2007
Source: http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.html
 R is developed and contributed by open
source community
 CRAN – the Comprehensive R Archive
Network
 Package repository of R
 7500+ packages, covering all aspects of
statistical analysis, machine learning, natural
language processing …
 Still exponentially growth
 Free!
Source: http://r4stats.com/2014/04/07/r-continues-its-rapid-growth/
1.Seasonal ARIMA
2.Non Seasonal
ARIMA
3.Seasonal ETS
4.Non -Seasonal ETS
5.Average of Seasonal
ETS and Seasonal
ARIMA
Mean Error (ME) - Average forecasting error (an error is the difference between the
predicted value and the actual value) on the test dataset
Root Mean Squared Error (RMSE) - The square root of the average of squared errors of
predictions made on the test dataset.
Mean Absolute Error (MAE) - The average of absolute errors
Mean Percentage Error (MPE) - The average of percentage errors
Mean Absolute Percentage Error (MAPE) - The average of absolute percentage errors
Mean Absolute Scaled Error (MASE)
Symmetric Mean Absolute Percentage Error (sMAPE)
Datasize
In-memory
In-memory In-Memory or Disk Based
Speed of
Analysis
Single threaded Multi-threaded
Multi-threaded, parallel
processing 1:N servers
Support
Community Community Community + Commercial
Analytic
Breadth &
Depth
7500+ innovative analytic
packages
7500+ innovative analytic
packages
7500+ innovative packages
+ commercial parallel high-
speed functions
License Open Source
Open Source
Commercial license.
Supported release with
indemnity
Microsoft
R Open
Microsoft
R Server
 Support standard Python library types such as
Pandas data frames and NumPy arrays.
 Execute the Python code is based on Anaconda
2.1, It comes with close to 200 of the most
common Python packages (as NumPy, SciPy and
Scikits-Learn )
 Output generate images from MatplotLib
KNN
21
What is Spark?
Data is growing faster than processing
speeds
Only solution is to parallelize data
processing on large clusters
Example: HDInsight
Fast, expressive cluster computing system compatible with Apache
Hadoop
• Works with any Hadoop-supported storage system (HDFS, S3, Avro, …)
Improves efficiency through:
• In-memory computing primitives
• General computation graphs
Improves usability through:
• Rich APIs in Java, Scala, Python
• Interactive shell
Spark was initially started by Matei Zaharia at UC Berkeley AMPLab
in 2009, was open sourced in 2010 and donated to Apache in 2013
Up to 100× faster
Often 2-10× less code
What is Spark?
Spark for Azure HDInsight
Spark
Node
Spark
Node
Spark
Node
Spark
Node
Spark
Node
Storage Layer
Decision
Maker
Decision
Maker
Decision
Maker
Spark Cluster
clients
Spark Notebooks
Using the Spark shell to run
interactive queries
Using the Spark shell to run Spark
SQL queries
Using a standalone Scala program
Spark
Notebooks
Zeppelin – for
Scala users
Zupyter – for
Python users
Programming
Spark
2015 System
Human Error Rate 4%
Speech Recognition could reach human parity in the next 3 years
33
Microsoft 透過深度學習技術贏得 ImageNet 2015所
有比賽項目冠軍
28.2
25.8
16.4
11.7
7.3 6.7
3.5
ILSVRC 2010
NEC
America
ILSVRC 2011
Xerox
ILSVRC 2012
AlexNet
ILSVRC 2013
Clarifi
ILSVRC 2014
VGG
ILSVRC 2014
GoogleNet
ILSVRC 2015
MSRA
ResNet
ImageNet Classification top-5 error (%)
Microsoft had all 5 entries being the 1-st places this year: ImageNet
classification, ImageNet localization, ImageNet detection, COCO
detection, and COCO segmentation
CNTK At the Heart: Computational Networks
•A generalization of machine learning models that can be
described as a series of computational steps.
• E.g., DNN, CNN, RNN, LSTM, DSSM, Seq2Sqe, Log-linear model
•Representation:
• A list of computational nodes denoted as
n = {node name : operation name}
• The parent-children relationship describing the operands
{n : c1, ···, cKn }
• Kn is the number of children of node n. For leaf nodes Kn = 0.
• Order of the children matters: e.g., XY is different from YX
• Given the inputs (operands) the value of the node can be computed.
•Can flexibly describe deep learning models.
• Adopted by many other popular tools as well
35
36
•A generalization of machine learning models that can be described
as a series of computational steps.
• E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model
•Representation:
• A list of computational nodes denoted as
n = {node name : operation name}
• The parent-children relationship describing the operands
{n : c1, ···, cKn }
• Kn is the number of children of node n. For leaf nodes Kn = 0.
• Order of the children matters: e.g., XY is different from YX
• Given the inputs (operands) the value of the node can be computed.
•Can flexibly describe deep learning models.
• Adopted by many other popular tools as well
“CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to
multi-GPU/multi-server.”
Theano only supports 1 GPU
Achieved with 1-bit gradient quantization
algorithm
0
10000
20000
30000
40000
50000
60000
70000
80000
CNTK Theano TensorFlow Torch 7 Caffe
speed comparison (samples/second), higher = better
[note: December 2015]
1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs)
* TensorFlow add distributed compute support in April 2016
 Micrsoft Reacher SLAWEK
SMYL win in CIF 2016 by
using LSTM Neural Network
 Powered by CNTK
CIF Competition 2016 – Final Results
• Contestant 1 – Slawek Smyl (LSTM-based
NN on deseasonalized data)
• Contestant 2 – Slawek Smyl (weighted
average of my 3 methods)
• Contestant 3 – prof. Sven Crone (Multilayer
Perceptron with a thorough feature search)
• Contestant 4 - Mikhail Artyukhov (previous
competition winner, ensemble models)
• Contestant 5 - Joerg Wichard, Bayer
Healthcare AG (Adaptive Forecasting
Strategy with Hybrid Ensemble Models)
• Contestant 6 – Slawek Smyl (LSTM-based
NN)
CNTK Demo
CNTK Architecture
41
CN
Builder
Lambda
CN
Description Use Build
ILearnerIDataReaderFeatures &
Labels Load Get data
IExecutionEngine
CPU/GPU
Task-specific
reader
SGD, AdaGrad,
etc.
Evaluate
Compute Gradient
(1) Kai Chen and Qiang Huo, “Scalable training of deep learning machines by incremental block training with intra-block
parallel optimization and blockwise model-update filtering”,
in Internal Conference on Acoustics, Speech and Signal Processing , March 2016, Shanghai, China.
 CNTK is a powerful tool that supports CPU/GPU and
runs under Windows/Linux
 CNTK is extensible with the low-coupling modular
design: adding new readers and new computation
nodes is easy with a new reader design
 Network definition language, macros, and model
editing language (as well as Python and C++
bindings in the future) makes network design and
modification easy
 Compared to other tools CNTK has a great balance
between efficiency, performance, and flexibility
microsoft.com/cognitive
Mahout Spark ML Azure ML R Server
Shared Service No No Yes No
Deployment Model PaaS PaaS PaaS IaaS
Extensibility High High Medium High
Deployment Complexity Medium High Low Medium
Cost High High Low High
Programming Languages Java/Scala Scala/Java/Python Python/R R
Algorithms Limited (growing) MLlib/scikit Many (scikit/CRAN) Many (CRAN)
Scalability High High Medium Medium
xgboost Vowpal Wabbit
Rattle
CNTK
*Copy
雲端隨選隨用 各式資料 快速上線服務 資料分享
跟協同合作
開放 支援完整資料
分析流程
https://gallery.cortanaintelligence.com/
唯一一家提供從資料匯
入到產生行動及資料呈
現完整的解決方案
ConnectR
• High-speed & direct
connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed
format text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical tests
• Range of predictive functions
• User tools for distributing customized R
algorithms across nodes
DistributedR
• Distributed computing framework
• Delivers cross-platform portability
Available on:
• Windows Servers
• Red Hat and SuSE Linux Servers
• Teradata Database
• Cloudera Hadoop
• Hortonworks Hadoop
• MapR Hadoop
R+CRAN
• Open source R interpreter
• R 3.2.2
• Freely-available huge range of R
algorithms
• Algorithms callable by RevoR
• 100% Compatible with existing R scripts,
functions and packages
RevoR
• Performance enhanced R
interpreter
• Based on open source R
• Adds high-performance
math library to speed up
linear algebra functions
R Open MicrosoftR Server
DeployRDevelopR
 Gradient Boosted Decision Trees
 Naïve Bayes
 Data import – Delimited, Fixed, SAS, SPSS,
OBDC
 Variable creation & transformation
 Recode variables
 Factor variables
 Missing value handling
 Sort, Merge, Split
 Aggregate by category (means, sums)
 Min / Max, Mean, Median (approx.)
 Quantiles (approx.)
 Standard Deviation
 Variance
 Correlation
 Covariance
 Sum of Squares (cross product matrix for set
variables)
 Pairwise Cross tabs
 Risk Ratio & Odds Ratio
 Cross-Tabulation of Data (standard tables & long
form)
 Marginal Summaries of Cross Tabulations
 Chi Square Test
 Kendall Rank Correlation
 Fisher’s Exact Test
 Student’s t-Test
 Subsample (observations & variables)
 Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
 Sum of Squares (cross product matrix for set
variables)
 Multiple Linear Regression
 Generalized Linear Models (GLM) exponential
family distributions: binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie. Standard link
functions: cauchit, identity, log, logit, probit. User
defined distributions & link functions.
 Covariance & Correlation Matrices
 Logistic Regression
 Classification & Regression Trees
 Predictions/scoring for models
 Residuals for all models
Predictive Models  K-Means
 Decision Trees
 Decision Forests
Cluster Analysis
Classification
Simulation
Variable Selection
 Stepwise Regression
 Simulation (e.g. Monte Carlo)
 Parallel Random Number Generation
Combination
 rxDataStep
 rxExec
 PEMA-R API Custom Algorithms
Additional Resources
•CNTK:
• https://github.com/Microsoft/CNTK
• Contains all the source code and example setups
• You may understand better how CNTK works by reading the source code
• New features are added constantly
•How to contact:
• CNTK team: ask a question on CNTK GitHub!
• Alexey:
• Email: alexey.kamenev@microsoft.com
• : https://www.linkedin.com/in/alexeykamenev
59

More Related Content

What's hot

Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with HadoopSangchul Song
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning PlatformMk Kim
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemallMakoto Yui
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big DataPaco Nathan
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks AnalysisJoud Khattab
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 

What's hot (18)

Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with Hadoop
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 

Viewers also liked

Azure Machine Learning using R
Azure Machine Learning using RAzure Machine Learning using R
Azure Machine Learning using RHerman Wu
 
Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3Herman Wu
 
Azure Data Lake 簡介
Azure Data Lake 簡介Azure Data Lake 簡介
Azure Data Lake 簡介Herman Wu
 
好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)Herman Wu
 
Azure HDInsight 介紹
Azure HDInsight 介紹Azure HDInsight 介紹
Azure HDInsight 介紹Herman Wu
 
選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲Herman Wu
 
Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介Herman Wu
 
物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例Herman Wu
 
運用MMLSpark 來加速Spark 上 機器學習專案
運用MMLSpark 來加速Spark 上機器學習專案運用MMLSpark 來加速Spark 上機器學習專案
運用MMLSpark 來加速Spark 上 機器學習專案Herman Wu
 
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT SutieHerman Wu
 

Viewers also liked (10)

Azure Machine Learning using R
Azure Machine Learning using RAzure Machine Learning using R
Azure Machine Learning using R
 
Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3
 
Azure Data Lake 簡介
Azure Data Lake 簡介Azure Data Lake 簡介
Azure Data Lake 簡介
 
好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)
 
Azure HDInsight 介紹
Azure HDInsight 介紹Azure HDInsight 介紹
Azure HDInsight 介紹
 
選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲
 
Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介
 
物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例
 
運用MMLSpark 來加速Spark 上 機器學習專案
運用MMLSpark 來加速Spark 上機器學習專案運用MMLSpark 來加速Spark 上機器學習專案
運用MMLSpark 來加速Spark 上 機器學習專案
 
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
 

Similar to Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習

New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Scientific
Scientific Scientific
Scientific marpierc
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 

Similar to Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 (20)

useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Scientific
Scientific Scientific
Scientific
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 

Recently uploaded

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習

  • 1. Azure Machine Learning – 其他篇 台灣微軟 技術傳教士 吳宏彬 8/25/2016
  • 2.
  • 3.
  • 4. 什麼是R語言 Open Source “lingua franca” Analytics, Computing, Modeling Global Community Millions of users 7000+ Algorithms, Test Data & Evaluations Can be Scaled to Big Data, Big Analytics Ecosystem Scalability
  • 5. Polls of data miners and analytics professionals on their software choices since 2007 Source: http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.html
  • 6.  R is developed and contributed by open source community  CRAN – the Comprehensive R Archive Network  Package repository of R  7500+ packages, covering all aspects of statistical analysis, machine learning, natural language processing …  Still exponentially growth  Free! Source: http://r4stats.com/2014/04/07/r-continues-its-rapid-growth/
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. 1.Seasonal ARIMA 2.Non Seasonal ARIMA 3.Seasonal ETS 4.Non -Seasonal ETS 5.Average of Seasonal ETS and Seasonal ARIMA
  • 13. Mean Error (ME) - Average forecasting error (an error is the difference between the predicted value and the actual value) on the test dataset Root Mean Squared Error (RMSE) - The square root of the average of squared errors of predictions made on the test dataset. Mean Absolute Error (MAE) - The average of absolute errors Mean Percentage Error (MPE) - The average of percentage errors Mean Absolute Percentage Error (MAPE) - The average of absolute percentage errors Mean Absolute Scaled Error (MASE) Symmetric Mean Absolute Percentage Error (sMAPE)
  • 14.
  • 15. Datasize In-memory In-memory In-Memory or Disk Based Speed of Analysis Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers Support Community Community Community + Commercial Analytic Breadth & Depth 7500+ innovative analytic packages 7500+ innovative analytic packages 7500+ innovative packages + commercial parallel high- speed functions License Open Source Open Source Commercial license. Supported release with indemnity Microsoft R Open Microsoft R Server
  • 16.
  • 17.  Support standard Python library types such as Pandas data frames and NumPy arrays.  Execute the Python code is based on Anaconda 2.1, It comes with close to 200 of the most common Python packages (as NumPy, SciPy and Scikits-Learn )  Output generate images from MatplotLib
  • 18.
  • 19. KNN
  • 20.
  • 22. Data is growing faster than processing speeds Only solution is to parallelize data processing on large clusters Example: HDInsight
  • 23. Fast, expressive cluster computing system compatible with Apache Hadoop • Works with any Hadoop-supported storage system (HDFS, S3, Avro, …) Improves efficiency through: • In-memory computing primitives • General computation graphs Improves usability through: • Rich APIs in Java, Scala, Python • Interactive shell Spark was initially started by Matei Zaharia at UC Berkeley AMPLab in 2009, was open sourced in 2010 and donated to Apache in 2013 Up to 100× faster Often 2-10× less code What is Spark?
  • 24. Spark for Azure HDInsight Spark Node Spark Node Spark Node Spark Node Spark Node Storage Layer Decision Maker Decision Maker Decision Maker Spark Cluster clients
  • 25. Spark Notebooks Using the Spark shell to run interactive queries Using the Spark shell to run Spark SQL queries Using a standalone Scala program
  • 26. Spark Notebooks Zeppelin – for Scala users Zupyter – for Python users
  • 28.
  • 29.
  • 30. 2015 System Human Error Rate 4% Speech Recognition could reach human parity in the next 3 years
  • 31.
  • 32.
  • 33. 33
  • 34. Microsoft 透過深度學習技術贏得 ImageNet 2015所 有比賽項目冠軍 28.2 25.8 16.4 11.7 7.3 6.7 3.5 ILSVRC 2010 NEC America ILSVRC 2011 Xerox ILSVRC 2012 AlexNet ILSVRC 2013 Clarifi ILSVRC 2014 VGG ILSVRC 2014 GoogleNet ILSVRC 2015 MSRA ResNet ImageNet Classification top-5 error (%) Microsoft had all 5 entries being the 1-st places this year: ImageNet classification, ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation
  • 35. CNTK At the Heart: Computational Networks •A generalization of machine learning models that can be described as a series of computational steps. • E.g., DNN, CNN, RNN, LSTM, DSSM, Seq2Sqe, Log-linear model •Representation: • A list of computational nodes denoted as n = {node name : operation name} • The parent-children relationship describing the operands {n : c1, ···, cKn } • Kn is the number of children of node n. For leaf nodes Kn = 0. • Order of the children matters: e.g., XY is different from YX • Given the inputs (operands) the value of the node can be computed. •Can flexibly describe deep learning models. • Adopted by many other popular tools as well 35
  • 36. 36 •A generalization of machine learning models that can be described as a series of computational steps. • E.g., DNN, CNN, RNN, LSTM, DSSM, Log-linear model •Representation: • A list of computational nodes denoted as n = {node name : operation name} • The parent-children relationship describing the operands {n : c1, ···, cKn } • Kn is the number of children of node n. For leaf nodes Kn = 0. • Order of the children matters: e.g., XY is different from YX • Given the inputs (operands) the value of the node can be computed. •Can flexibly describe deep learning models. • Adopted by many other popular tools as well
  • 37. “CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.” Theano only supports 1 GPU Achieved with 1-bit gradient quantization algorithm 0 10000 20000 30000 40000 50000 60000 70000 80000 CNTK Theano TensorFlow Torch 7 Caffe speed comparison (samples/second), higher = better [note: December 2015] 1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs) * TensorFlow add distributed compute support in April 2016
  • 38.  Micrsoft Reacher SLAWEK SMYL win in CIF 2016 by using LSTM Neural Network  Powered by CNTK
  • 39. CIF Competition 2016 – Final Results • Contestant 1 – Slawek Smyl (LSTM-based NN on deseasonalized data) • Contestant 2 – Slawek Smyl (weighted average of my 3 methods) • Contestant 3 – prof. Sven Crone (Multilayer Perceptron with a thorough feature search) • Contestant 4 - Mikhail Artyukhov (previous competition winner, ensemble models) • Contestant 5 - Joerg Wichard, Bayer Healthcare AG (Adaptive Forecasting Strategy with Hybrid Ensemble Models) • Contestant 6 – Slawek Smyl (LSTM-based NN)
  • 41. CNTK Architecture 41 CN Builder Lambda CN Description Use Build ILearnerIDataReaderFeatures & Labels Load Get data IExecutionEngine CPU/GPU Task-specific reader SGD, AdaGrad, etc. Evaluate Compute Gradient
  • 42. (1) Kai Chen and Qiang Huo, “Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering”, in Internal Conference on Acoustics, Speech and Signal Processing , March 2016, Shanghai, China.
  • 43.  CNTK is a powerful tool that supports CPU/GPU and runs under Windows/Linux  CNTK is extensible with the low-coupling modular design: adding new readers and new computation nodes is easy with a new reader design  Network definition language, macros, and model editing language (as well as Python and C++ bindings in the future) makes network design and modification easy  Compared to other tools CNTK has a great balance between efficiency, performance, and flexibility
  • 45. Mahout Spark ML Azure ML R Server Shared Service No No Yes No Deployment Model PaaS PaaS PaaS IaaS Extensibility High High Medium High Deployment Complexity Medium High Low Medium Cost High High Low High Programming Languages Java/Scala Scala/Java/Python Python/R R Algorithms Limited (growing) MLlib/scikit Many (scikit/CRAN) Many (CRAN) Scalability High High Medium Medium
  • 46.
  • 47.
  • 49.
  • 50.
  • 51. 雲端隨選隨用 各式資料 快速上線服務 資料分享 跟協同合作 開放 支援完整資料 分析流程
  • 54.
  • 55. ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Range of predictive functions • User tools for distributing customized R algorithms across nodes DistributedR • Distributed computing framework • Delivers cross-platform portability Available on: • Windows Servers • Red Hat and SuSE Linux Servers • Teradata Database • Cloudera Hadoop • Hortonworks Hadoop • MapR Hadoop R+CRAN • Open source R interpreter • R 3.2.2 • Freely-available huge range of R algorithms • Algorithms callable by RevoR • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math library to speed up linear algebra functions R Open MicrosoftR Server DeployRDevelopR
  • 56.  Gradient Boosted Decision Trees  Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort, Merge, Split  Aggregate by category (means, sums)  Min / Max, Mean, Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test  Subsample (observations & variables)  Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models Predictive Models  K-Means  Decision Trees  Decision Forests Cluster Analysis Classification Simulation Variable Selection  Stepwise Regression  Simulation (e.g. Monte Carlo)  Parallel Random Number Generation Combination  rxDataStep  rxExec  PEMA-R API Custom Algorithms
  • 57. Additional Resources •CNTK: • https://github.com/Microsoft/CNTK • Contains all the source code and example setups • You may understand better how CNTK works by reading the source code • New features are added constantly •How to contact: • CNTK team: ask a question on CNTK GitHub! • Alexey: • Email: alexey.kamenev@microsoft.com • : https://www.linkedin.com/in/alexeykamenev 59