SlideShare a Scribd company logo
1 of 52
| © Copyright 2015 Hitachi Consulting1
Operational Machine Learning
Using Microsoft Technologies for Applied Data Science
Khalid M. Salama, Ph.D.
Business Insights & Analytics
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2015 Hitachi Consulting2
Outline
 Introduction to Data Science
 From Experimental Data Science to Operational Machine Learning
 MS Technologies for Data Science & Advanced Analytics
 Demos & Screenshots
 Concluding Remarks
| © Copyright 2015 Hitachi Consulting3
Introduction to Data Science and
Machine Learning
| © Copyright 2015 Hitachi Consulting4
Data Science and Machine Learning
What?
Data
Science
Machine
Learning
Statistics
Artificial
Intelligence
Databases
Other
Technologies
“Data mining, an interdisciplinary subfield of
computer science, is the computational
process of automatic discovering interesting
and useful patterns in large data sets”
Other Related Technologies:
 Visualization
 Big Data
 High Performance Computing
 Cloud Computing
 Others..
| © Copyright 2015 Hitachi Consulting5
Data Science and Machine Learning
Why?
Vision Analytics
Recommendation
engines
Advertising
analysis
Weather
forecasting for
business planning
Social network
analysis
Legal
discovery and
document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Predictive
Maintenance
Location-based
tracking and
services
Personalized
Insurance
The objective of data
science is to provide you
with actionable insights to
support decision making….
| © Copyright 2015 Hitachi Consulting6
Data Science and Machine Learning
How?
Classification Learning
Build a model that can predict the target class
of an input case
Cluster Analysis
Discover natural groupings within the data
points
Association Rule Discovery
Extract frequent patterns present
in the data
Regression Modeling
Build a model that can estimate the response
value given an input case
Time Series Analysis
Analysis of temporal data to forecast
future values
Probabilistic Modeling
Compute the probability of an event to occur
given a set of conditions
Similarity Analysis
Identify similar cases to a given input case
based on the input features
Collaborative Filtering
Filtering of information using techniques
involving collaboration viewpoints
IF .. AND .. AND ..
THEN A
ELSE IF .. AND ..
THEN C
ELSE IF .. AND ..
THEN B
..
..
ELSE C
| © Copyright 2015 Hitachi Consulting7
From Experimental Data Science to Operational
Machine Learning
| © Copyright 2015 Hitachi Consulting8
Exploratory Data
Analysis
Data Science Activities
Experimentation vs. Operationalization
Collect Data
Blend
Visualize
Prepare
ML Experiment
Algorithm Selection
Parameter Tuning
Training & Testing
Model
Learning
Dataset
Report of Visuals &
Findings
Decision!
Data Analysis &
Experimentation
 Interactive
 Easy to perform
 Rich Visualizations
| © Copyright 2015 Hitachi Consulting9
Online Apps
Automated ML Pipeline
Data Science Activities
Experimentation vs. Operationalization
Model
Data Ingestion Data Processing Model Training Scoring
Deploy
Web APIs
Predict
Train
Export
Batch
Real-time
Operational ML Pipelines
 Pipelined (ETL Integration)
 Scalable
 Apps Integration
| © Copyright 2015 Hitachi Consulting10
Microsoft Advanced Analytics
Technologies
| © Copyright 2015 Hitachi Consulting11
Microsoft Advanced Analytics
Cortana Intelligence Suite https://gallery.cortanaintelligence.com/
| © Copyright 2015 Hitachi Consulting12
Microsoft Advanced Analytics
Data Science, Machine Learning, & Intelligence
Data Mining – SQL Server
Analysis Services
Azure Machine
Learning
Spark ML – Azure
HDInsight
Microsoft R Server – SQL
Server R Services
Azure Cognitive
Services
Cognitive Features – Azure
Data Lake Analytics
Microsoft Bot Framework
| © Copyright 2015 Hitachi Consulting13
Microsoft Azure Machine
Learning
| © Copyright 2015 Hitachi Consulting14
Azure Machine Learning
MS Cloud-native Data Science
 Cloud-based Machine Learning Services
 Interactive Data Science Studio
 Rich built-in functionality
 Imports data from everywhere
 Easy to develop and productionize – Web Services
 Extensible via R and Python scripts
Azure Machine Learning
Build and deploy
models in the cloud
Import Data
Publish
Result
Input
Web Services
Batch Scoring
Retrain Model
Limitations
 Only Cloud-based (Data Regulations)
 Scalability – Maximum dataset size = 10GB
 Microsoft R Open is not supported, yet
 No Source Control
| © Copyright 2015 Hitachi Consulting15
Azure Machine Learning
Real-time Predictions
App Event Hub
Stream Analytics Power BI
Azure ML Web Service
Send data points Consume messages
Send
Input
Receive
Output
Send Results
(Input, Output)
| © Copyright 2015 Hitachi Consulting16
Azure Machine Learning
Built-in Features
| © Copyright 2015 Hitachi Consulting17
Azure Machine Learning
Algorithms Cheat Sheet
| © Copyright 2015 Hitachi Consulting18
Azure Machine Learning
ML Studio
| © Copyright 2015 Hitachi Consulting19
Azure Machine Learning
Web Service
| © Copyright 2015 Hitachi Consulting20
Azure Machine Learning
Stream Analytics Integration
| © Copyright 2015 Hitachi Consulting21
Azure Machine Learning
AzureML R Library
| © Copyright 2015 Hitachi Consulting22
Microsoft R Server
| © Copyright 2015 Hitachi Consulting23
Microsoft R Server
R in Microsoft World
Microsoft R Open (MRO)
 Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft
 More efficient and multi-threaded computation
 Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions
 Compatible with all R-related software
| © Copyright 2015 Hitachi Consulting24
Microsoft R Server
Comparison
CRAN MRO MRS
Data size In-memory In-memory In-memory & disk
Efficiency Single threaded Multi-threaded Multi-threaded, parallel
processing 1:N servers
Support Community Community Community + Commercial
Functionality 7500+ innovative analytic
packages
7500+ innovative analytic
packages
7500+ innovative packages +
commercial parallel high-speed
functions
Licence Open Source Open Source Commercial license.
| © Copyright 2015 Hitachi Consulting25
Microsoft R Server
Components & Compute Contexts
Microsoft R Server
CRAN&MSROpen
ScaleR
DistributedR
ConnectR
MicrosoftML-Package
Operationalization
(msrdeploy)
RStudio | RTVS
MS R Client
Scale & Deploy
DifferentComputeContexts
 Installed on Windows or Linux
 ScaleR - Optimized for parallel execution on
Big Data, to eliminate memory limitations.
 ConnectR – Provides access to local file
systems, hdfs, hive, sqlserver, Teradata, etc.
 DistributeR - Adaptable parallel execution
framework to enable running on different
(distributed) compute contexts.
 Operationalization (msrdeploy) – Deploy
the model as a Web API.
| © Copyright 2015 Hitachi Consulting26
Microsoft R Server
Microsoft R Server – ScaleR Example
Check Environment
Load XDF
Prepare Data – Process XDF
Build Predictive Model
Perform Prediction
| © Copyright 2015 Hitachi Consulting27
Microsoft R Server
Microsoft R Server – ScaleR Functionality
| © Copyright 2015 Hitachi Consulting28
SQL Server (in-database)
R Services
| © Copyright 2015 Hitachi Consulting29
SQL Server R Services
In-database Analytics
 R Services (in-database) – Keep your analytics close to the data
 T-SQL Script – Can be encapsulated in Stored Procedures
 Models are built, trained, saved as part of the ETL process (SSIS)
 Used for batch prediction (as part of the ETL process)
 Visual Studio SQL Database Project, Source Controlled, etc.
 Uses Microsoft ScaleR libraries
Limitations
 Not supported in Azure SQL DB/DW, yet
 Not suitable for Interactive Data Science
 Only R, no python, yet.
Process
Data
Train R Model
Serialize
Store Models
Maintain
Models
Process
Data
Load Model
Perform
Prediction
Store Results
ETL Using SSIS
Data Sources
Prediction Pipeline
Training Pipeline
EXECUTE sp_execute_external_script
| © Copyright 2015 Hitachi Consulting30
SQL Server R Services
T-SQL Script
PredictionModel Summary
Prediction Output
Build and Save Model
Configure
| © Copyright 2015 Hitachi Consulting31
Microsoft Analysis Services
Data Mining
| © Copyright 2015 Hitachi Consulting32
SQL Server Analysis Services
Data Mining
Limitations
 Limited Extensibility
 Limited Algorithms & Functionalities
 No Azure PaaS Service
Azure SQL DW/DB SQL Server
Analysis Services
Online Apps
Build Model
Result
Explore/ Interpret Model
DMX Query
Batch
Scoring
Retrain
Model
 Process data from many OLEDB and ODBC data sources
 Easy to build, interpret, deploy, and productionize
 SSIS Support – Tasks to Train & Predict
 Interactive Visuals for model interpretation
 Excel Integration – Data Mining Add-in
| © Copyright 2015 Hitachi Consulting33
SQL Server Analysis Services
Overview
Data Source View Mining Structure
Mining Algorithm Mining Model
 Decision Tress
 Naïve-Bayes
 Linear Regression
 Neural Networks
 Association Rules
 Clustering
 Sequence Clustering
 Time Series
| © Copyright 2015 Hitachi Consulting34
SQL Server Analysis Services
Visualizing Models
| © Copyright 2015 Hitachi Consulting35
SQL Server Analysis Services
Excel Data Mining Add-in
| © Copyright 2015 Hitachi Consulting36
Azure Cognitive Services
| © Copyright 2015 Hitachi Consulting37
Azure Cognitive Services
Ready-to-use Intelligence
| © Copyright 2015 Hitachi Consulting38
Azure Cognitive Services
Setup a Cognitive Services API
https://www.microsoft.com/cognitive-services/
| © Copyright 2015 Hitachi Consulting39
Cognitive Features in Azure Data
Lake Analytics
| © Copyright 2015 Hitachi Consulting40
Azure Data Lake Analytics
Cognitive Features
 Pre-built intelligence – Text & Image Analysis
 Integrated with your data processing pipelines (DLA)
 Used for batch recognition (not singleton real-time)
 Scheduled & Automated using Azure Data Factory
 R & Python Extensions!
 Scalable – Suitable for Big Data
Ingest Polybase
Input Output
Data Processing &
Patten Recognition
Source Data
(Text, Images, etc.)
Enterprise Data
Warehouse
Azure SQL DW
Data Lake
Analytics Jobs
Data Lake
Store
Azure Data Factory
Data Lake
Store
Limitations
 Limited Features
 Not suitable for real-time scoring
| © Copyright 2015 Hitachi Consulting41
Azure Data Lake Analytics
First-time Installation
| © Copyright 2015 Hitachi Consulting42
Azure Data Lake Analytics
U-SQL Script
| © Copyright 2015 Hitachi Consulting43
Azure Data Lake Analytics
Execution & Output
| © Copyright 2015 Hitachi Consulting44
Spark ML on HDInsight
| © Copyright 2015 Hitachi Consulting45
Spark ML on HDInsight
Scalable ML for Big Data
 Rich Spark ML Libraries
 Scalable, distributed, in-memory
 Extensible – Python, R, Java, Scala
 Suitable for Big Data - Batch Model Training and Scoring
 Spark Streaming for Real-time predictions
 Scheduled & Automated Using Azure Data Factory
Ingest
- Process Data
- Build Model
- Save Model
- Load Model
- Perform Predictions
- Save Results
Source Data
Save
Load
Polybase
Enterprise Data
Warehouse
Azure SQL DW
Azure Data Factory
HDInsight
Limitations
 Expensive to keep it up & running
 Slow to spin-up
| © Copyright 2015 Hitachi Consulting46
Spark ML on HDInsight
Spark ML Pipelines
Spark ML standardizes APIs for machine learning algorithms to make it easier to combine
multiple task into a single pipeline, or workflow.
 Transformers – used for data pre-processing. Input: DataFrame - Output:DataFrame
 Estimators – ML algorithm used to build a predictive model. Input: DataFrame - Output: Model.
 Parameters – Configurations for Transformers and Estimators
 Pipeline – Chains Transformers and Estimators
ML Pipeline
Dataset
(DataFrame)
Transformer A
(pre-processing)
Estimator
(ML Learning
Algorithm)
Model
Evaluation
Parameters
Transformer Z
(pre-processing)
…
| © Copyright 2015 Hitachi Consulting47
Spark ML on HDInsight
Spark ML Functionality
Text Feature Extraction
 TF-IDF (HashingTF and IDF)
 Word2Vec
 CountVectorizer
 Tokenizer
 StopWordsRemover
 n-gram
Feature Selection
 VectorSlicer
 RFormula
 ChiSqSelector
Dimensionality Reduction
 PCA
Features Vector Preparation
 VectorAssembler
 VectorIndexer
 StringIndexer
 IndexToString
Transformers
Feature Type Conversion
 Binarizer
 Discrete Cosine Transform (DCT)
 OneHotEncoder
 Bucketizer
 QuantileDiscretizer
Feature Scaling
 Normalizer
 StandardScaler
 MinMaxScaler
Feature Construction
 SQLTransformer
 ElementwiseProduct
 PolynomialExpansion
Estimators (supervised)
Classification
 Decision Trees – Ensembles
 Naïve-Bayes
 SVM
Regression
 Linear Regression
 SVM
Other (Unsupervised)
Clustering
Collaborative Filtering
Frequent Pattern Mining
| © Copyright 2015 Hitachi Consulting48
Spark ML on HDInsight
Spark ML - Example
| © Copyright 2015 Hitachi Consulting49
Spark ML on HDInsight
BigDL – Intel’s Distributed Deep Learning Library
https://azure.microsoft.com/en-us/blog/use-bigdl-on-hdinsight-spark-for-distributed-deep-learning/
| © Copyright 2015 Hitachi Consulting50
Concluding Remarks
Interactive Data
Science Studio
 Azure ML
Extensibility
 Spark on HDI
 Azure ML
 Microsoft R Server
Built-in
Features
 Azure ML
 Spark on HDI
Rich Model
Interpretability
 SSAS Data Mining
 Microsoft R Server
Scalability (Big
Data)
 Microsoft R Server
 Spark on HDI
ML Pipelining
 Spark on HDI
 Azure Data Lake
Analytics
 SQL Server R
Services
 Data Mining SSAS
Integration with
Operational Apps
 Azure ML
 Azure Cognitive
Services
 Microsoft R
Operationalization
Pre-built
Intelligence
 Azure Cognitive
Services
 Azure Data Lake
Analytics
| © Copyright 2015 Hitachi Consulting51
My Background
Applying Computational Intelligence in Data Mining
 Honorary Research Fellow, School of Computing , University of Kent.
 Ph.D. Computer Science, University of Kent, Canterbury, UK.
 28+ published journal and conference papers in the fields of AI and ML
https://www.researchgate.net/profile/Khalid_Salama https://www.linkedin.com/in/khalid-salama-24403144/
https://github.com/khalid-m-salama/sqlbits-2017
| © Copyright 2015 Hitachi Consulting52
Thanks!

More Related Content

What's hot

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Eric Rice
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASDataWorks Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureJohn Archer
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the adminTillmann Eitelberg
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...DataWorks Summit
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 

What's hot (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the admin
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 

Similar to Operational Machine Learning MS Technologies

Bhadale group of companies projects portfolio
Bhadale group of companies  projects portfolioBhadale group of companies  projects portfolio
Bhadale group of companies projects portfolioVijayananda Mohire
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
Start your first IoT and AR journey with Transition Technologies PSC
Start your first IoT and AR journey with Transition Technologies PSCStart your first IoT and AR journey with Transition Technologies PSC
Start your first IoT and AR journey with Transition Technologies PSCTransition Technologies PSC
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSNicolas Georgeault
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Anzo smart data integration february 2015
Anzo smart data integration february 2015Anzo smart data integration february 2015
Anzo smart data integration february 2015John Rueter
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataSasha Lazarevic
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileCloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileManuela Moroncini
 
Bhadale group of companies cloud service catalogue
Bhadale group of companies cloud service catalogueBhadale group of companies cloud service catalogue
Bhadale group of companies cloud service catalogueVijayananda Mohire
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Pactera_US
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxEngineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxManikaahuja4
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...Advanced Analytics and Artificial Intelligence - Transforming Your Business T...
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...David J Rosenthal
 
Digital Transformation using Predictive Analytics and Big Data
Digital Transformation using Predictive Analytics and Big DataDigital Transformation using Predictive Analytics and Big Data
Digital Transformation using Predictive Analytics and Big DataAmazon Web Services
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfssuserd23711
 

Similar to Operational Machine Learning MS Technologies (20)

Bhadale group of companies projects portfolio
Bhadale group of companies  projects portfolioBhadale group of companies  projects portfolio
Bhadale group of companies projects portfolio
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Start your first IoT and AR journey with Transition Technologies PSC
Start your first IoT and AR journey with Transition Technologies PSCStart your first IoT and AR journey with Transition Technologies PSC
Start your first IoT and AR journey with Transition Technologies PSC
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
About CDAP
About CDAPAbout CDAP
About CDAP
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platform
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Anzo smart data integration february 2015
Anzo smart data integration february 2015Anzo smart data integration february 2015
Anzo smart data integration february 2015
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano AmmirabileCloud Computing 2010 - IBM Italia - Mariano Ammirabile
Cloud Computing 2010 - IBM Italia - Mariano Ammirabile
 
Bhadale group of companies cloud service catalogue
Bhadale group of companies cloud service catalogueBhadale group of companies cloud service catalogue
Bhadale group of companies cloud service catalogue
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxEngineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...Advanced Analytics and Artificial Intelligence - Transforming Your Business T...
Advanced Analytics and Artificial Intelligence - Transforming Your Business T...
 
Digital Transformation using Predictive Analytics and Big Data
Digital Transformation using Predictive Analytics and Big DataDigital Transformation using Predictive Analytics and Big Data
Digital Transformation using Predictive Analytics and Big Data
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 

More from Khalid Salama

Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryKhalid Salama
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsightKhalid Salama
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureKhalid Salama
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 

More from Khalid Salama (6)

Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous Delivery
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Operational Machine Learning MS Technologies

  • 1. | © Copyright 2015 Hitachi Consulting1 Operational Machine Learning Using Microsoft Technologies for Applied Data Science Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2015 Hitachi Consulting2 Outline  Introduction to Data Science  From Experimental Data Science to Operational Machine Learning  MS Technologies for Data Science & Advanced Analytics  Demos & Screenshots  Concluding Remarks
  • 3. | © Copyright 2015 Hitachi Consulting3 Introduction to Data Science and Machine Learning
  • 4. | © Copyright 2015 Hitachi Consulting4 Data Science and Machine Learning What? Data Science Machine Learning Statistics Artificial Intelligence Databases Other Technologies “Data mining, an interdisciplinary subfield of computer science, is the computational process of automatic discovering interesting and useful patterns in large data sets” Other Related Technologies:  Visualization  Big Data  High Performance Computing  Cloud Computing  Others..
  • 5. | © Copyright 2015 Hitachi Consulting5 Data Science and Machine Learning Why? Vision Analytics Recommendation engines Advertising analysis Weather forecasting for business planning Social network analysis Legal discovery and document archiving Pricing analysis Fraud detection Churn analysis Predictive Maintenance Location-based tracking and services Personalized Insurance The objective of data science is to provide you with actionable insights to support decision making….
  • 6. | © Copyright 2015 Hitachi Consulting6 Data Science and Machine Learning How? Classification Learning Build a model that can predict the target class of an input case Cluster Analysis Discover natural groupings within the data points Association Rule Discovery Extract frequent patterns present in the data Regression Modeling Build a model that can estimate the response value given an input case Time Series Analysis Analysis of temporal data to forecast future values Probabilistic Modeling Compute the probability of an event to occur given a set of conditions Similarity Analysis Identify similar cases to a given input case based on the input features Collaborative Filtering Filtering of information using techniques involving collaboration viewpoints IF .. AND .. AND .. THEN A ELSE IF .. AND .. THEN C ELSE IF .. AND .. THEN B .. .. ELSE C
  • 7. | © Copyright 2015 Hitachi Consulting7 From Experimental Data Science to Operational Machine Learning
  • 8. | © Copyright 2015 Hitachi Consulting8 Exploratory Data Analysis Data Science Activities Experimentation vs. Operationalization Collect Data Blend Visualize Prepare ML Experiment Algorithm Selection Parameter Tuning Training & Testing Model Learning Dataset Report of Visuals & Findings Decision! Data Analysis & Experimentation  Interactive  Easy to perform  Rich Visualizations
  • 9. | © Copyright 2015 Hitachi Consulting9 Online Apps Automated ML Pipeline Data Science Activities Experimentation vs. Operationalization Model Data Ingestion Data Processing Model Training Scoring Deploy Web APIs Predict Train Export Batch Real-time Operational ML Pipelines  Pipelined (ETL Integration)  Scalable  Apps Integration
  • 10. | © Copyright 2015 Hitachi Consulting10 Microsoft Advanced Analytics Technologies
  • 11. | © Copyright 2015 Hitachi Consulting11 Microsoft Advanced Analytics Cortana Intelligence Suite https://gallery.cortanaintelligence.com/
  • 12. | © Copyright 2015 Hitachi Consulting12 Microsoft Advanced Analytics Data Science, Machine Learning, & Intelligence Data Mining – SQL Server Analysis Services Azure Machine Learning Spark ML – Azure HDInsight Microsoft R Server – SQL Server R Services Azure Cognitive Services Cognitive Features – Azure Data Lake Analytics Microsoft Bot Framework
  • 13. | © Copyright 2015 Hitachi Consulting13 Microsoft Azure Machine Learning
  • 14. | © Copyright 2015 Hitachi Consulting14 Azure Machine Learning MS Cloud-native Data Science  Cloud-based Machine Learning Services  Interactive Data Science Studio  Rich built-in functionality  Imports data from everywhere  Easy to develop and productionize – Web Services  Extensible via R and Python scripts Azure Machine Learning Build and deploy models in the cloud Import Data Publish Result Input Web Services Batch Scoring Retrain Model Limitations  Only Cloud-based (Data Regulations)  Scalability – Maximum dataset size = 10GB  Microsoft R Open is not supported, yet  No Source Control
  • 15. | © Copyright 2015 Hitachi Consulting15 Azure Machine Learning Real-time Predictions App Event Hub Stream Analytics Power BI Azure ML Web Service Send data points Consume messages Send Input Receive Output Send Results (Input, Output)
  • 16. | © Copyright 2015 Hitachi Consulting16 Azure Machine Learning Built-in Features
  • 17. | © Copyright 2015 Hitachi Consulting17 Azure Machine Learning Algorithms Cheat Sheet
  • 18. | © Copyright 2015 Hitachi Consulting18 Azure Machine Learning ML Studio
  • 19. | © Copyright 2015 Hitachi Consulting19 Azure Machine Learning Web Service
  • 20. | © Copyright 2015 Hitachi Consulting20 Azure Machine Learning Stream Analytics Integration
  • 21. | © Copyright 2015 Hitachi Consulting21 Azure Machine Learning AzureML R Library
  • 22. | © Copyright 2015 Hitachi Consulting22 Microsoft R Server
  • 23. | © Copyright 2015 Hitachi Consulting23 Microsoft R Server R in Microsoft World Microsoft R Open (MRO)  Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft  More efficient and multi-threaded computation  Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions  Compatible with all R-related software
  • 24. | © Copyright 2015 Hitachi Consulting24 Microsoft R Server Comparison CRAN MRO MRS Data size In-memory In-memory In-memory & disk Efficiency Single threaded Multi-threaded Multi-threaded, parallel processing 1:N servers Support Community Community Community + Commercial Functionality 7500+ innovative analytic packages 7500+ innovative analytic packages 7500+ innovative packages + commercial parallel high-speed functions Licence Open Source Open Source Commercial license.
  • 25. | © Copyright 2015 Hitachi Consulting25 Microsoft R Server Components & Compute Contexts Microsoft R Server CRAN&MSROpen ScaleR DistributedR ConnectR MicrosoftML-Package Operationalization (msrdeploy) RStudio | RTVS MS R Client Scale & Deploy DifferentComputeContexts  Installed on Windows or Linux  ScaleR - Optimized for parallel execution on Big Data, to eliminate memory limitations.  ConnectR – Provides access to local file systems, hdfs, hive, sqlserver, Teradata, etc.  DistributeR - Adaptable parallel execution framework to enable running on different (distributed) compute contexts.  Operationalization (msrdeploy) – Deploy the model as a Web API.
  • 26. | © Copyright 2015 Hitachi Consulting26 Microsoft R Server Microsoft R Server – ScaleR Example Check Environment Load XDF Prepare Data – Process XDF Build Predictive Model Perform Prediction
  • 27. | © Copyright 2015 Hitachi Consulting27 Microsoft R Server Microsoft R Server – ScaleR Functionality
  • 28. | © Copyright 2015 Hitachi Consulting28 SQL Server (in-database) R Services
  • 29. | © Copyright 2015 Hitachi Consulting29 SQL Server R Services In-database Analytics  R Services (in-database) – Keep your analytics close to the data  T-SQL Script – Can be encapsulated in Stored Procedures  Models are built, trained, saved as part of the ETL process (SSIS)  Used for batch prediction (as part of the ETL process)  Visual Studio SQL Database Project, Source Controlled, etc.  Uses Microsoft ScaleR libraries Limitations  Not supported in Azure SQL DB/DW, yet  Not suitable for Interactive Data Science  Only R, no python, yet. Process Data Train R Model Serialize Store Models Maintain Models Process Data Load Model Perform Prediction Store Results ETL Using SSIS Data Sources Prediction Pipeline Training Pipeline EXECUTE sp_execute_external_script
  • 30. | © Copyright 2015 Hitachi Consulting30 SQL Server R Services T-SQL Script PredictionModel Summary Prediction Output Build and Save Model Configure
  • 31. | © Copyright 2015 Hitachi Consulting31 Microsoft Analysis Services Data Mining
  • 32. | © Copyright 2015 Hitachi Consulting32 SQL Server Analysis Services Data Mining Limitations  Limited Extensibility  Limited Algorithms & Functionalities  No Azure PaaS Service Azure SQL DW/DB SQL Server Analysis Services Online Apps Build Model Result Explore/ Interpret Model DMX Query Batch Scoring Retrain Model  Process data from many OLEDB and ODBC data sources  Easy to build, interpret, deploy, and productionize  SSIS Support – Tasks to Train & Predict  Interactive Visuals for model interpretation  Excel Integration – Data Mining Add-in
  • 33. | © Copyright 2015 Hitachi Consulting33 SQL Server Analysis Services Overview Data Source View Mining Structure Mining Algorithm Mining Model  Decision Tress  Naïve-Bayes  Linear Regression  Neural Networks  Association Rules  Clustering  Sequence Clustering  Time Series
  • 34. | © Copyright 2015 Hitachi Consulting34 SQL Server Analysis Services Visualizing Models
  • 35. | © Copyright 2015 Hitachi Consulting35 SQL Server Analysis Services Excel Data Mining Add-in
  • 36. | © Copyright 2015 Hitachi Consulting36 Azure Cognitive Services
  • 37. | © Copyright 2015 Hitachi Consulting37 Azure Cognitive Services Ready-to-use Intelligence
  • 38. | © Copyright 2015 Hitachi Consulting38 Azure Cognitive Services Setup a Cognitive Services API https://www.microsoft.com/cognitive-services/
  • 39. | © Copyright 2015 Hitachi Consulting39 Cognitive Features in Azure Data Lake Analytics
  • 40. | © Copyright 2015 Hitachi Consulting40 Azure Data Lake Analytics Cognitive Features  Pre-built intelligence – Text & Image Analysis  Integrated with your data processing pipelines (DLA)  Used for batch recognition (not singleton real-time)  Scheduled & Automated using Azure Data Factory  R & Python Extensions!  Scalable – Suitable for Big Data Ingest Polybase Input Output Data Processing & Patten Recognition Source Data (Text, Images, etc.) Enterprise Data Warehouse Azure SQL DW Data Lake Analytics Jobs Data Lake Store Azure Data Factory Data Lake Store Limitations  Limited Features  Not suitable for real-time scoring
  • 41. | © Copyright 2015 Hitachi Consulting41 Azure Data Lake Analytics First-time Installation
  • 42. | © Copyright 2015 Hitachi Consulting42 Azure Data Lake Analytics U-SQL Script
  • 43. | © Copyright 2015 Hitachi Consulting43 Azure Data Lake Analytics Execution & Output
  • 44. | © Copyright 2015 Hitachi Consulting44 Spark ML on HDInsight
  • 45. | © Copyright 2015 Hitachi Consulting45 Spark ML on HDInsight Scalable ML for Big Data  Rich Spark ML Libraries  Scalable, distributed, in-memory  Extensible – Python, R, Java, Scala  Suitable for Big Data - Batch Model Training and Scoring  Spark Streaming for Real-time predictions  Scheduled & Automated Using Azure Data Factory Ingest - Process Data - Build Model - Save Model - Load Model - Perform Predictions - Save Results Source Data Save Load Polybase Enterprise Data Warehouse Azure SQL DW Azure Data Factory HDInsight Limitations  Expensive to keep it up & running  Slow to spin-up
  • 46. | © Copyright 2015 Hitachi Consulting46 Spark ML on HDInsight Spark ML Pipelines Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple task into a single pipeline, or workflow.  Transformers – used for data pre-processing. Input: DataFrame - Output:DataFrame  Estimators – ML algorithm used to build a predictive model. Input: DataFrame - Output: Model.  Parameters – Configurations for Transformers and Estimators  Pipeline – Chains Transformers and Estimators ML Pipeline Dataset (DataFrame) Transformer A (pre-processing) Estimator (ML Learning Algorithm) Model Evaluation Parameters Transformer Z (pre-processing) …
  • 47. | © Copyright 2015 Hitachi Consulting47 Spark ML on HDInsight Spark ML Functionality Text Feature Extraction  TF-IDF (HashingTF and IDF)  Word2Vec  CountVectorizer  Tokenizer  StopWordsRemover  n-gram Feature Selection  VectorSlicer  RFormula  ChiSqSelector Dimensionality Reduction  PCA Features Vector Preparation  VectorAssembler  VectorIndexer  StringIndexer  IndexToString Transformers Feature Type Conversion  Binarizer  Discrete Cosine Transform (DCT)  OneHotEncoder  Bucketizer  QuantileDiscretizer Feature Scaling  Normalizer  StandardScaler  MinMaxScaler Feature Construction  SQLTransformer  ElementwiseProduct  PolynomialExpansion Estimators (supervised) Classification  Decision Trees – Ensembles  Naïve-Bayes  SVM Regression  Linear Regression  SVM Other (Unsupervised) Clustering Collaborative Filtering Frequent Pattern Mining
  • 48. | © Copyright 2015 Hitachi Consulting48 Spark ML on HDInsight Spark ML - Example
  • 49. | © Copyright 2015 Hitachi Consulting49 Spark ML on HDInsight BigDL – Intel’s Distributed Deep Learning Library https://azure.microsoft.com/en-us/blog/use-bigdl-on-hdinsight-spark-for-distributed-deep-learning/
  • 50. | © Copyright 2015 Hitachi Consulting50 Concluding Remarks Interactive Data Science Studio  Azure ML Extensibility  Spark on HDI  Azure ML  Microsoft R Server Built-in Features  Azure ML  Spark on HDI Rich Model Interpretability  SSAS Data Mining  Microsoft R Server Scalability (Big Data)  Microsoft R Server  Spark on HDI ML Pipelining  Spark on HDI  Azure Data Lake Analytics  SQL Server R Services  Data Mining SSAS Integration with Operational Apps  Azure ML  Azure Cognitive Services  Microsoft R Operationalization Pre-built Intelligence  Azure Cognitive Services  Azure Data Lake Analytics
  • 51. | © Copyright 2015 Hitachi Consulting51 My Background Applying Computational Intelligence in Data Mining  Honorary Research Fellow, School of Computing , University of Kent.  Ph.D. Computer Science, University of Kent, Canterbury, UK.  28+ published journal and conference papers in the fields of AI and ML https://www.researchgate.net/profile/Khalid_Salama https://www.linkedin.com/in/khalid-salama-24403144/ https://github.com/khalid-m-salama/sqlbits-2017
  • 52. | © Copyright 2015 Hitachi Consulting52 Thanks!

Editor's Notes

  1. Hello everyone and welcome to the last day of Sqlbits… My name is Khalid Salama. I work at Hitachi Consulting, in this Business Insights & Analytics practice, focusing on designing and delivering Data & Analytics Solutions I n this session, I would like to explore with you the various Microsoft technologies that can help to operationalize your Machine Learning pipelines and enable scalable data science. Well, it’s more of an engineering session than a data science one to be fair, however, I think it is an important topic to discuss because, data science is perceived as experimental, isolated activity… While in many contemporary applications, specially with the rise of digital transformation and IoT, your data science products need to be incorporated with your operational systems, and you ML pipelines need to be an integral part of your ETL process. So, we will try to touch on various the Microsoft options to perform both experimental data science and operational ML.
  2. So without over due, we have a lot of ground to cover… I’ll start with a very quick intro to data science, I assume everybody here has “a” background on data science Then, I give some insights on the difference between exploratory data science and Operational ML After that, we are going to delve into the MS technologies for Advanced Analytics and show several demos…. And finally, I will conclude with some general remarks.
  3. So let’s get started: Data Science and Machine Learning
  4. So Data Science, also has been known in the academia as data mining, is the process of discovering interesting & useful patterns hidden in your data… It’s an interdisciplinary field of computing, where concepts and techniques for different areas are employed, such as statistics, artificial intelligence, machine learning, databases, and others, like, visualization, big data, cloud computing, et cetera…
  5. The objective of data science is to provide you with actionable insights, that is, valuable findings that support decision making, for example, whether to invest in this new product line, or whether to perform a certain critical medical operation…. These findings may form a reliable model that is able, to a certain degree of confidence, to predict, estimate, or forecast a certain value or a future event, which can allow your business to better optimize its responsive actions… for instance, to perform stock optimization or service-package personalization There is an enormous number of data science & ML applications in many business domains, of which some I had the opportunity to work on, including Customer Propensity modelling for campaigning and churn analysis, automatic risk detection in security operations, Social media & Customer Feedback analysis, demand sensing, and many others….
  6. The principal data mining tasks are classification, regression, clustering, Association Rule Discovery (or frequent pattern mining), time series analysis, probabilistic modelling, similarity analysis, and collaborative filtering Each focuses on tackling a particular analytics challenge Some, of course, fall under the category of supervised learning, others are considered unsupervised learning techniques.
  7. Now let’s take a look onto the activities of any a science process, to try to discriminate between experimental data science and operational machine learning
  8. It starts with an exploratory data analysis phase… After being presented with an analytics problem, you start with collecting the relevant data and importing it to your environment… Then you blend this data by performing some generic data engineering tasks, such as merging, joining, aggerating, and so on…. After that, you apply some machine learning-specific data preparation tasks, also know as features engineering, including features construction, extraction, selection, and feature tuning, like scaling, handling missing values & outliers, and so on. The output of this phase is a learning dataset, that will be used in your ML experimentation phase. In this phase, you perform iterative steps training & testing to select the algorithm & parameters that produce the model that best captures the hidden patterns in your data… The final output of this whole experimentation phase is a report of findings, along with comprehensive visuals. That can be in the form of a markdown file, using jupyter notebooks, that tills the end-to-end data analysis story and support reproducibility. These results may lead to a specific decision or recommendation. In some scenarios, these results are the ultimate output of the data science activity
  9. However, in many other scenarios, where you need repeated and real-time intelligence, such as targeted advertising and recommender systems, you need to productionize the models produced from the previous data science process, and integrate them with your operational systems to perform online predictions and recommendations In which case, the whole ML pipeline, including data ingestion, processing, model training and/or scoring, needs to be a repeatable, automated process The process should produce a model that exposes Web API to be integrated with your operational apps and consumed real-time
  10. Now let’s switch gears now and talk about technology…
  11. You have probably seen this diagram more that 100 times during the last couple of days This is the Cortana Intelligence Suite, which provides you with a plethora of services to build end-to-end, batch and real-time, data analytics platform… You can visit the Cortana Intelligence Gallery online to find many templates for Analytics Solutions However, we are going to focus on only the machine learning and intelligence parts of it…
  12. Here we have the different Microsoft Technologies for data science & machine learning, which are: Azure Machine Learning Microsoft R Server & SQL Server (in-database) R Services Analysis Services Data Mining Spark Machine Learning on Azure HDInsight Cognitive Features in Azure Data Lake Analytics Azure Cognitive Services, & Microsoft Bot framework Does anyone know any other Microsoft tool or technology for Data Mining or Machine Learning? So I can declare this as an inclusive list  Alright, we will try to touch on each of these ones - except the Bot framework - discuss features and limitations, and show a simple demo for each.
  13. So Let’s start with Azure Machine Learning… Azure ML has around for while now, and gained a lot of popularity in both experimental data science and operationalizing ML models…
  14. It is a cloud-based PaaS service Provides an interactive (drag and drop) Data Science studio to perform you experiments With a lot of built-in functionality for data processing and modelling You can import data from different sources However, the most interesting feature in Azure ML is that you can easily productionize your ML model as Web Services, so that you can integrate them in your ETL for training and batch scoring Or your operational applications for real-time predictions You can also extend your ML experiments using python and R scripts Let’s quickly switch to the Azure portal and have a look at it
  15. Microsoft R Server Probably the most important analytics product for Microsoft at the moment…. If you are an R developer, you will probably know that open-source R has scalability limitations, because it is single-threaded and in-memory only… You needed to use commercial R libraries to make your program multi-threaded, process your data partly in-memory and partly on-desk, so that you can handle data sizes bigger than your workstation’s memory, and run your R app on a cluster for distributed computing and scaling your data processing…
  16. Well, Microsoft has acquired a company that builds such libraries, called Revolution Analytics, and included their open-source libraries in MRO, and their commercial ones in MRS Besides, MSR Open has enhanced Math Kernel Library, for more efficient mathematical computations and it is compatible with all R-related software
  17. So in this comparison, you can see that Microsoft R Server processes the data in-memory as well as on-desk (using external data frames or xds, which we will see shortly), It is multi-threaded, and supports distributed computing to scale for big data processing
  18. Let’s take a closer look to the main components of MS R Server ScaleR – The core libraries in MS R, optimized for parallel execution and uses external data frames to overcome the memory limitation ConnecR – provides access to various data sources including distributed file systems and relational databases DistributeR – allows you R application to run in different execution context, including distributed one So you can write you application once, and with a few lines of code, you can configure your application to run on different execution context in order to scale it MS R Server Operationalization - allows you to deploy your R models, on a configured R Server, as Web APIs (similar to what we have seen in Azure ML) using msrdeploy libraries Let’s have a look on a sample MS R code
  19. SQL Server R Services…
  20. Supporting the execution of R scripts within SQL Server has opened interesting opportunities to bring your analytics close your data, and integrate your Machine Learning process with your SQL-based ETL. Now you can encapsulate you ML task, as R scripts, into a T-SQL stored procedure, to perform model training or batch scoring Then you can call this proc from an SSIS package As for a training pipeline, you use data in SQL Server Tables to train and R model, then you serialize this Model and stored and maintain it in a table Similarly for a prediction pipeline, you load the R model that is stored in the tables, perform prediction using data in a SQL Table, and save back the results This is all using sp_execute_external_script Another advantage is that your ML scripts are managed and source controlled as part of your SQL Database project in Visual Studio In terms of limitations, R Services are not supported in Azure SQL DB/DW, yet It is not suitable for Interactive Data Science, and it’s a bit hard to debug. It also supports R only, which is not ideal for Python Data Scientists. However, I think we will see SQL Server in-database Python Services in the future… Let’s have a look on how this works
  21. Well, I can’t talk about Microsoft & Machine Learning without mentioning my old friend Analysis Services Data Mining…. I’ve personally delivered a couple of interesting Data Mining Solutions using this technology It has been around for more than 10 years, since SQL Server 2005, yet no one now is talking about it…
  22. Although SQL Server Analysis Services has limited Data Mining features, as well as very limited extensibility (that is, you need to write you own ML algorithms and integrate them in using only C++) I think it has a number of useful features that makes it a good candidate for delivering and productionizing an ML Solutions It can process data from various OLEDB and ODBC data sources, that includes Azure SQL DB and DW, It is very easy to build and deploy your data mining models in an Analysis Services database and use the model for batch prediction using DMX Seamless integration with SSIS, and the latter includes special tasks to perform train and predict queries However, the most interesting thing about Analysis Services is that provides very useful interactive console to explore and interpret the constructed models It has an Excel Add-in to allow non-technical users to do Data Mining
  23. In Analysis Services Data Mining, you build your mining structures based on the tables in the data source view. Then you run an algorithm on the data in the mining structure to produce a mining model The algorithms available are:… With a wide range of parameter configurations
  24. If you want your system to perform generic intelligence tasks, such as text analysis or image recognition, you should probably consider Azure Cognitive Services, before you build your own models.. You have a number of REST APIs that perform various analytics, including language, speech, vision, and search, that you can directly be consumed from you operational App… Very good documentation is supplied for each API
  25. You can create a cognitive service on you Azure Portal by specifying the API type and Service level (which affects how much you will pay), and look at the documentation to learn how consume a specific service Let’s have a look on demo that uses the emotion API
  26. While Azure Cognitive Services APIs are used in your operational apps for real-time predictions, Data Lake Analytics provide cognitive features to be integrated in your batch big data processing pipeline This includes image and text analytics So you can benefit from the scalability of the U-SQL jobs to process large amount of data, as well as the orchestration of azure data factory to streamline your process In addition, Data Lake Analytics supports Python extension in the U-SQL Script, let’s have a look So basically you can write some python function to process your data as part of the U-SQL Job Now let’s see a Text Analytics demo using Data Lake Analytics Cognitive Features
  27. If you are a spark developer, you will probably know that spark has rich ML libraries to perform various data processing and model learning tasks It is scalable, and extensible (you can implement your own algorithms using PySpark, SparkR, Java, or Scala)… Typically suitable for model training and batch scoring… And as it is available on HDInsight, you Spark data processing and ML pipelines can be automated and scheduled using Azure Data Factory Here is the link for the Data Factory Template for this…
  28. Spark has standard APIs for ML to make it easier to combine multiple task, including pre-processing and modelling, onto a single workflow, or pipeline