SlideShare a Scribd company logo
1 of 17
Machine Learning with Microsoft Azure
#msdevcon
Dmitry Petukhov,
ML/DS Preacher, Coffee Addicted &&
Machine Intelligence Researcher @ OpenWay
R for Fun Prototyping
developer PC
code
result RAM
Data
IDE
RStudio or/and
Visual Studio
Runtime
CRAN or/and
Microsoft R Open
Flexibility Distributed Scalable: horizontal, vertical Fault-tolerance Reliable
OSS-based BigData-ready LSML Secure
R for full cycle development
CRISP-DM
Model evaluation
Evaluate measures of quality model
(ROC, RMSE, F-Score, etc.)
Feature Selection**
Feature Selection
Feature Scaling (Normalization)
Dimension Reduction
Final Model
Training ML algorithm
Share results
Revision
FinalModelEvaluation
Data Flow
Cross-validation
Training Dataset Test Dataset
Source: http://0xCode.in/azure-ml-for-data-scientist
This work is licensed under a Creative Commons Attribution 4.0 International License
Step 1: read data
# 1. from local file system
library(data.table)
dt <- fread("data/transactions.csv")
# > Read 6849346 rows and 6 (of 6) columns from 0.299 GB file in 00:00:31
# 2. from Web
dt <- fread("https://raw.githubusercontent.com/greggles/mcc-codes/master/mcc_codes.csv",
sep = ",", stringsAsFactors = F, header = T, colClasses = list(character = 2)))
# > % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
# > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 14872 100 14872 0 0 29744 0 --:--:-- --
:--:-- --:--:-- 31710
# 3. from Azure Blob Storage
library(AzureSMR)
sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey = "{KEY}")
sc
azureGetBlob(sc,
storageAccount = "contestsdata",
container = "financial",
blob = "transactions.csv",
type = "text")
Step 1: read data
# 4. from MS SQL Server
library(RODBC) # Provides database connectivity
connectionString <- "Driver={ODBC Driver 13 for SQL
Server};Server=tcp:msdevcon.database.windows.net,1433;Database=TransDb;Uid=..."
trans.conn <- odbcDriverConnect(connectionString) # open RODBC connection
sqlSave(trans.conn, mcc.raw, "MCC2", addPK = T) # save data to table
mccFromDb <- sqlQuery(trans.conn, "SELECT * FROM MCC2 WHERE edited_description LIKE '%For Visa Only%'") # get data
head(mccFromDb)
#> rownames code edited_description combined_description
#> 1 978 9700 Automated Referral Service ( For Visa Only) Automated Referral Service ( For Visa Only)
#> 2 979 9701 Visa Credential Service ( For Visa Only) Visa Credential Service ( For Visa Only)
#> 3 980 9702 GCAS Emergency Services ( For Visa Only) GCAS Emergency Services ( For Visa Only)
#> 4 981 9950 Intra ??“ Company Purchases ( For Visa Only) Intra ??“ Company Purchases ( For Visa Only) Intra ??“
close(trans.conn)
# * Excel, HDFS, Amazon S3, REST-services as data sources
# { "0 10:23:26" "1 10:19:29" "1 10:20:56" } > { 0, 1, 1 }
getDay <- function(x) { strsplit(x, split = " ")[[1]][1] }
trans <- trans.raw %>%
# remove invalid rows
filter(
!is.na(amount) | amount != 0
) %>%
# transform data
mutate(
OperationType = factor(ifelse(amount > 0, "income", "withdraw")),
TransDay = as.numeric(sapply(tr_datetime, getDay)),
Amount = abs(amount)
) %>%
# remove redundant columns
select(
-c(tr_datetime, amount, term_id)
) %>%
# set column names
rename(
CustomerId = customer_id, MCC = mcc_code, TransType = tr_type
) %>%
# sort
arrange(
TransDay, Amount
)
Step 2: preprocessing data
Step 3: feature engineering
# calculate stats
library(dplyr)
customers.stats <- trans.x %>%
mutate(LogAmount = log(Amount)) %>%
group_by(CustomerId, OperationType, Gender) %>%
filter(n() > 30) %>%
summarize(
Min = min(LogAmount),
P1 = quantile(LogAmount, probs = c(.01)),
Q1 = quantile(LogAmount, probs = c(.25)),
Mean = mean(LogAmount),
Q3 = quantile(LogAmount, probs = c(.75)),
P99 = quantile(LogAmount, probs = c(.99)),
Max = max(LogAmount),
Total = sum(Amount),
Count = n(),
StandDev = sd(LogAmount)
) %>%
ungroup()
# shape from long to wide table form
library(reshape2)
x <- dcast(customers.stats, CustomerId + Gender ~ OperationType, value.var = "Mean", fun.aggregate = mean)
Step 3: feature engineering
library(ggplot2)
ggplot(x, aes(x = income, y = withdraw)) +
geom_point(alpha = 0.25, colour = "darkblue") + facet_grid(. ~ Gender) +
xlab("Income, rub") + ylab("Withdraw, rub")
Step 4: training ML-model
# train model
model <- glm(formula = gender ~ ., family = binomial(link = "logit"), data = dt.train)
# score model
p <- predict(model, newdata = dt.test, type = "response")
pr <- prediction(p, dt.test$gender)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
plot(prf)
# evaluate model
auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
auc
Challenges
Data Science evolve rapidly
Data growing even faster
Data >> Memory (now and evermore)
We must scale better
Complex infrastructure
Zoo of frameworks
May be cloud?
#msdevcon
Big Data + Cloud + Machine Learning
Долго, дорого, …
#msdevcon
Apache Spark/Hadoop + Azure + R Server
Доступен как PaaS-сервис
Application Server
(Task Manager)
Flexibility Distributed Large scalable Fault-tolerance Reliable
OSS-based BigData-ready LSML Secure
Team
Head
Node Worker
Node
DFS
ML for the bloody Enterprise
Version
Control
Distributed Execution Framework
Tasks
Big Data Cluster
Tasks
Pull
code
Azure Blob Storage
Microsoft R Server
Team
Head
Node Worker
Node
HDFS API
R for the Enterprise
Apache Spark / Hadoop
Tasks
Azure HDInsight
Tasks
Pull
code
Microsoft R
Microsoft R Open and Microsoft R Server #R
MicrosoftML #R
Microsoft R Server for Azure HDInsight #PaaS
R Server on Apache Spark
Data Science VM #R #IaaS
CNTK & GPU Instances #NN #GPU #OSS
Batch AI Training preview #PaaS #NN #GPU
Azure Machine Learning #PaaS
R scripts, modules and models #R
Jupyter Notebooks #R #SaaS
R-to-cloud: AzureSMR, AzureML #R #OSS
Cognitive Services #SaaS #NN
SQL Server R Services #R #PaaS
Power BI #R #Viz
Execute R scripts
Visual Studio
R extensions for VS2015
R in-box-support for VS2017
MicrosoftAzure
© 2017, Dmitry Petukhov. CC BY-SA 4.0 license. Microsoft and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
Data Science must win!
Q&A
Now or later (use contacts below)
Ping me
Habr: @codezombie
All contacts: http://0xCode.in/author

More Related Content

What's hot

3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 
PistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for AnalyticsPistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for AnalyticsAndrew Morgan
 
Data Visualizations with D3
Data Visualizations with D3Data Visualizations with D3
Data Visualizations with D3Doug Domeny
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael HacksteinNoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael Hacksteindistributed matters
 
Learn D3.js in 90 minutes
Learn D3.js in 90 minutesLearn D3.js in 90 minutes
Learn D3.js in 90 minutesJos Dirksen
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015NoSQLmatters
 
The rise of json in rdbms land jab17
The rise of json in rdbms land jab17The rise of json in rdbms land jab17
The rise of json in rdbms land jab17alikonweb
 
Clojure for Data Science
Clojure for Data ScienceClojure for Data Science
Clojure for Data Sciencehenrygarner
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package ExamplesDr. Volkan OBAN
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationChristopher Batey
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich OverviewMongoDB
 
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0ArangoDB Database
 
Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.Dr. Volkan OBAN
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkChristopher Batey
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
 
Megadata With Python and Hadoop
Megadata With Python and HadoopMegadata With Python and Hadoop
Megadata With Python and Hadoopryancox
 

What's hot (20)

3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
PistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for AnalyticsPistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for Analytics
 
Data Visualizations with D3
Data Visualizations with D3Data Visualizations with D3
Data Visualizations with D3
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael HacksteinNoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
Learn D3.js in 90 minutes
Learn D3.js in 90 minutesLearn D3.js in 90 minutes
Learn D3.js in 90 minutes
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
The rise of json in rdbms land jab17
The rise of json in rdbms land jab17The rise of json in rdbms land jab17
The rise of json in rdbms land jab17
 
Clojure for Data Science
Clojure for Data ScienceClojure for Data Science
Clojure for Data Science
 
R-ggplot2 package Examples
R-ggplot2 package ExamplesR-ggplot2 package Examples
R-ggplot2 package Examples
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
 
Enter The Matrix
Enter The MatrixEnter The Matrix
Enter The Matrix
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich Overview
 
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
 
Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
 
CLUSTERGRAM
CLUSTERGRAMCLUSTERGRAM
CLUSTERGRAM
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Megadata With Python and Hadoop
Megadata With Python and HadoopMegadata With Python and Hadoop
Megadata With Python and Hadoop
 

Viewers also liked

Schneider Electric Smart City Success Stories (Worldwide)
Schneider Electric Smart City  Success Stories (Worldwide)Schneider Electric Smart City  Success Stories (Worldwide)
Schneider Electric Smart City Success Stories (Worldwide)Schneider Electric India
 
Philip bane smart city
Philip bane smart cityPhilip bane smart city
Philip bane smart cityaztechcouncil
 
City as Platform Cooperative - Smart City Expo - Barcelona
City as Platform Cooperative -  Smart City Expo - Barcelona City as Platform Cooperative -  Smart City Expo - Barcelona
City as Platform Cooperative - Smart City Expo - Barcelona DigitalTown, Inc
 
Machine Intelligence for Fraud Prediction
Machine Intelligence for Fraud PredictionMachine Intelligence for Fraud Prediction
Machine Intelligence for Fraud PredictionDmitry Petukhov
 
Democratizing Artificial Intelligence
Democratizing Artificial IntelligenceDemocratizing Artificial Intelligence
Democratizing Artificial IntelligenceDmitry Petukhov
 
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017sandhibhide
 
Smart-city implementation reference model
Smart-city implementation reference modelSmart-city implementation reference model
Smart-city implementation reference modelAlexander SAMARIN
 
AI in IoT: Use Cases and Challenges
AI in IoT: Use Cases and ChallengesAI in IoT: Use Cases and Challenges
AI in IoT: Use Cases and ChallengesDmitry Petukhov
 
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...AIIM International
 
Microsoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewMicrosoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewDmitry Petukhov
 
Smart City and Smart Government : Strategy, Model, and Cases of Korea
Smart City and Smart Government : Strategy, Model, and Cases of KoreaSmart City and Smart Government : Strategy, Model, and Cases of Korea
Smart City and Smart Government : Strategy, Model, and Cases of KoreaJong-Sung Hwang
 
What is next for IoT and IIoT
What is next for IoT and IIoTWhat is next for IoT and IIoT
What is next for IoT and IIoTAhmed Banafa
 
AI & Robotic Process Automation (RPA) to Digitally Transform Your Environment
AI & Robotic Process Automation (RPA) to Digitally Transform Your EnvironmentAI & Robotic Process Automation (RPA) to Digitally Transform Your Environment
AI & Robotic Process Automation (RPA) to Digitally Transform Your EnvironmentCprime
 
Build your First IoT Application with IBM Watson IoT
Build your First IoT Application with IBM Watson IoTBuild your First IoT Application with IBM Watson IoT
Build your First IoT Application with IBM Watson IoTJanakiram MSV
 
Iot for smart city
Iot for smart cityIot for smart city
Iot for smart citysanalkumar k
 

Viewers also liked (20)

Schneider Electric Smart City Success Stories (Worldwide)
Schneider Electric Smart City  Success Stories (Worldwide)Schneider Electric Smart City  Success Stories (Worldwide)
Schneider Electric Smart City Success Stories (Worldwide)
 
Philip bane smart city
Philip bane smart cityPhilip bane smart city
Philip bane smart city
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
City as Platform Cooperative - Smart City Expo - Barcelona
City as Platform Cooperative -  Smart City Expo - Barcelona City as Platform Cooperative -  Smart City Expo - Barcelona
City as Platform Cooperative - Smart City Expo - Barcelona
 
Machine Intelligence for Fraud Prediction
Machine Intelligence for Fraud PredictionMachine Intelligence for Fraud Prediction
Machine Intelligence for Fraud Prediction
 
Democratizing Artificial Intelligence
Democratizing Artificial IntelligenceDemocratizing Artificial Intelligence
Democratizing Artificial Intelligence
 
Auxis Webinar: Diving into RPA
Auxis Webinar: Diving into RPAAuxis Webinar: Diving into RPA
Auxis Webinar: Diving into RPA
 
AI for Retail Banking
AI for Retail BankingAI for Retail Banking
AI for Retail Banking
 
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017
Monetizing the iot by Sandhiprakash Bhide generic-01-24-2017
 
Smart-city implementation reference model
Smart-city implementation reference modelSmart-city implementation reference model
Smart-city implementation reference model
 
2016 Current State of IoT
2016 Current State of IoT 2016 Current State of IoT
2016 Current State of IoT
 
AI in IoT: Use Cases and Challenges
AI in IoT: Use Cases and ChallengesAI in IoT: Use Cases and Challenges
AI in IoT: Use Cases and Challenges
 
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...
[Webinar Slides] Robotic Process Automation 101 What is it? What can it mean ...
 
CISCO SMART CITY
CISCO SMART CITYCISCO SMART CITY
CISCO SMART CITY
 
Microsoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewMicrosoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture View
 
Smart City and Smart Government : Strategy, Model, and Cases of Korea
Smart City and Smart Government : Strategy, Model, and Cases of KoreaSmart City and Smart Government : Strategy, Model, and Cases of Korea
Smart City and Smart Government : Strategy, Model, and Cases of Korea
 
What is next for IoT and IIoT
What is next for IoT and IIoTWhat is next for IoT and IIoT
What is next for IoT and IIoT
 
AI & Robotic Process Automation (RPA) to Digitally Transform Your Environment
AI & Robotic Process Automation (RPA) to Digitally Transform Your EnvironmentAI & Robotic Process Automation (RPA) to Digitally Transform Your Environment
AI & Robotic Process Automation (RPA) to Digitally Transform Your Environment
 
Build your First IoT Application with IBM Watson IoT
Build your First IoT Application with IBM Watson IoTBuild your First IoT Application with IBM Watson IoT
Build your First IoT Application with IBM Watson IoT
 
Iot for smart city
Iot for smart cityIot for smart city
Iot for smart city
 

Similar to Machine Learning with Microsoft Azure

IaaS, PaaS, and DevOps for Data Scientist
IaaS, PaaS, and DevOps for Data ScientistIaaS, PaaS, and DevOps for Data Scientist
IaaS, PaaS, and DevOps for Data ScientistDmitry Petukhov
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developersllangit
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The WinITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The WinITCamp
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Amazon Web Services
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with ScalaOto Brglez
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...randyguck
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraCédrick Lunven
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeWim Godden
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
MongoDB World 2019: Life In Stitch-es
MongoDB World 2019: Life In Stitch-esMongoDB World 2019: Life In Stitch-es
MongoDB World 2019: Life In Stitch-esMongoDB
 

Similar to Machine Learning with Microsoft Azure (20)

IaaS, PaaS, and DevOps for Data Scientist
IaaS, PaaS, and DevOps for Data ScientistIaaS, PaaS, and DevOps for Data Scientist
IaaS, PaaS, and DevOps for Data Scientist
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The WinITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The Win
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions:
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with Scala
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache Cassandra
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
MongoDB World 2019: Life In Stitch-es
MongoDB World 2019: Life In Stitch-esMongoDB World 2019: Life In Stitch-es
MongoDB World 2019: Life In Stitch-es
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

More from Dmitry Petukhov

Intelligent Banking: AI cases in Retail and Commercial Banking
Intelligent Banking: AI cases in Retail and Commercial BankingIntelligent Banking: AI cases in Retail and Commercial Banking
Intelligent Banking: AI cases in Retail and Commercial BankingDmitry Petukhov
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningDmitry Petukhov
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDmitry Petukhov
 
Machine Learning in Microsoft Azure
Machine Learning in Microsoft AzureMachine Learning in Microsoft Azure
Machine Learning in Microsoft AzureDmitry Petukhov
 

More from Dmitry Petukhov (8)

Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
Intelligent Banking: AI cases in Retail and Commercial Banking
Intelligent Banking: AI cases in Retail and Commercial BankingIntelligent Banking: AI cases in Retail and Commercial Banking
Intelligent Banking: AI cases in Retail and Commercial Banking
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
R + Apache Spark
R + Apache SparkR + Apache Spark
R + Apache Spark
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Microsoft Azure + R
Microsoft Azure + RMicrosoft Azure + R
Microsoft Azure + R
 
Machine Learning in Microsoft Azure
Machine Learning in Microsoft AzureMachine Learning in Microsoft Azure
Machine Learning in Microsoft Azure
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Machine Learning with Microsoft Azure

  • 1. Machine Learning with Microsoft Azure #msdevcon Dmitry Petukhov, ML/DS Preacher, Coffee Addicted && Machine Intelligence Researcher @ OpenWay
  • 2. R for Fun Prototyping developer PC code result RAM Data IDE RStudio or/and Visual Studio Runtime CRAN or/and Microsoft R Open Flexibility Distributed Scalable: horizontal, vertical Fault-tolerance Reliable OSS-based BigData-ready LSML Secure
  • 3. R for full cycle development CRISP-DM Model evaluation Evaluate measures of quality model (ROC, RMSE, F-Score, etc.) Feature Selection** Feature Selection Feature Scaling (Normalization) Dimension Reduction Final Model Training ML algorithm Share results Revision FinalModelEvaluation Data Flow Cross-validation Training Dataset Test Dataset Source: http://0xCode.in/azure-ml-for-data-scientist This work is licensed under a Creative Commons Attribution 4.0 International License
  • 4. Step 1: read data # 1. from local file system library(data.table) dt <- fread("data/transactions.csv") # > Read 6849346 rows and 6 (of 6) columns from 0.299 GB file in 00:00:31 # 2. from Web dt <- fread("https://raw.githubusercontent.com/greggles/mcc-codes/master/mcc_codes.csv", sep = ",", stringsAsFactors = F, header = T, colClasses = list(character = 2))) # > % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed # > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 14872 100 14872 0 0 29744 0 --:--:-- -- :--:-- --:--:-- 31710 # 3. from Azure Blob Storage library(AzureSMR) sc <- createAzureContext(tenantID = "{TID}", clientID = "{CID}", authKey = "{KEY}") sc azureGetBlob(sc, storageAccount = "contestsdata", container = "financial", blob = "transactions.csv", type = "text")
  • 5. Step 1: read data # 4. from MS SQL Server library(RODBC) # Provides database connectivity connectionString <- "Driver={ODBC Driver 13 for SQL Server};Server=tcp:msdevcon.database.windows.net,1433;Database=TransDb;Uid=..." trans.conn <- odbcDriverConnect(connectionString) # open RODBC connection sqlSave(trans.conn, mcc.raw, "MCC2", addPK = T) # save data to table mccFromDb <- sqlQuery(trans.conn, "SELECT * FROM MCC2 WHERE edited_description LIKE '%For Visa Only%'") # get data head(mccFromDb) #> rownames code edited_description combined_description #> 1 978 9700 Automated Referral Service ( For Visa Only) Automated Referral Service ( For Visa Only) #> 2 979 9701 Visa Credential Service ( For Visa Only) Visa Credential Service ( For Visa Only) #> 3 980 9702 GCAS Emergency Services ( For Visa Only) GCAS Emergency Services ( For Visa Only) #> 4 981 9950 Intra ??“ Company Purchases ( For Visa Only) Intra ??“ Company Purchases ( For Visa Only) Intra ??“ close(trans.conn) # * Excel, HDFS, Amazon S3, REST-services as data sources
  • 6. # { "0 10:23:26" "1 10:19:29" "1 10:20:56" } > { 0, 1, 1 } getDay <- function(x) { strsplit(x, split = " ")[[1]][1] } trans <- trans.raw %>% # remove invalid rows filter( !is.na(amount) | amount != 0 ) %>% # transform data mutate( OperationType = factor(ifelse(amount > 0, "income", "withdraw")), TransDay = as.numeric(sapply(tr_datetime, getDay)), Amount = abs(amount) ) %>% # remove redundant columns select( -c(tr_datetime, amount, term_id) ) %>% # set column names rename( CustomerId = customer_id, MCC = mcc_code, TransType = tr_type ) %>% # sort arrange( TransDay, Amount ) Step 2: preprocessing data
  • 7. Step 3: feature engineering # calculate stats library(dplyr) customers.stats <- trans.x %>% mutate(LogAmount = log(Amount)) %>% group_by(CustomerId, OperationType, Gender) %>% filter(n() > 30) %>% summarize( Min = min(LogAmount), P1 = quantile(LogAmount, probs = c(.01)), Q1 = quantile(LogAmount, probs = c(.25)), Mean = mean(LogAmount), Q3 = quantile(LogAmount, probs = c(.75)), P99 = quantile(LogAmount, probs = c(.99)), Max = max(LogAmount), Total = sum(Amount), Count = n(), StandDev = sd(LogAmount) ) %>% ungroup() # shape from long to wide table form library(reshape2) x <- dcast(customers.stats, CustomerId + Gender ~ OperationType, value.var = "Mean", fun.aggregate = mean)
  • 8. Step 3: feature engineering library(ggplot2) ggplot(x, aes(x = income, y = withdraw)) + geom_point(alpha = 0.25, colour = "darkblue") + facet_grid(. ~ Gender) + xlab("Income, rub") + ylab("Withdraw, rub")
  • 9. Step 4: training ML-model # train model model <- glm(formula = gender ~ ., family = binomial(link = "logit"), data = dt.train) # score model p <- predict(model, newdata = dt.test, type = "response") pr <- prediction(p, dt.test$gender) prf <- performance(pr, measure = "tpr", x.measure = "fpr") plot(prf) # evaluate model auc <- performance(pr, measure = "auc") auc <- auc@y.values[[1]] auc
  • 10. Challenges Data Science evolve rapidly Data growing even faster Data >> Memory (now and evermore) We must scale better Complex infrastructure Zoo of frameworks May be cloud?
  • 11. #msdevcon Big Data + Cloud + Machine Learning Долго, дорого, …
  • 12. #msdevcon Apache Spark/Hadoop + Azure + R Server Доступен как PaaS-сервис
  • 13. Application Server (Task Manager) Flexibility Distributed Large scalable Fault-tolerance Reliable OSS-based BigData-ready LSML Secure Team Head Node Worker Node DFS ML for the bloody Enterprise Version Control Distributed Execution Framework Tasks Big Data Cluster Tasks Pull code
  • 14. Azure Blob Storage Microsoft R Server Team Head Node Worker Node HDFS API R for the Enterprise Apache Spark / Hadoop Tasks Azure HDInsight Tasks Pull code
  • 15. Microsoft R Microsoft R Open and Microsoft R Server #R MicrosoftML #R Microsoft R Server for Azure HDInsight #PaaS R Server on Apache Spark Data Science VM #R #IaaS CNTK & GPU Instances #NN #GPU #OSS Batch AI Training preview #PaaS #NN #GPU Azure Machine Learning #PaaS R scripts, modules and models #R Jupyter Notebooks #R #SaaS R-to-cloud: AzureSMR, AzureML #R #OSS Cognitive Services #SaaS #NN SQL Server R Services #R #PaaS Power BI #R #Viz Execute R scripts Visual Studio R extensions for VS2015 R in-box-support for VS2017 MicrosoftAzure
  • 16. © 2017, Dmitry Petukhov. CC BY-SA 4.0 license. Microsoft and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. Data Science must win!
  • 17. Q&A Now or later (use contacts below) Ping me Habr: @codezombie All contacts: http://0xCode.in/author

Editor's Notes

  1. (c) 2017, Dmitry Petukhov. CC BY-SA 4.0 license.
  2. Event: https://events.techdays.ru/Future-Technologies/2017-06/