SlideShare a Scribd company logo
Machine Learning, Key to Your Formulation
Challenges
Marc Borowczak, PRC Consulting LLC (http://www.prcconsulting.net)
February 17, 2016
Formulation Challenges are Everywhere…
Step1: Retrieve Existing Data
Step 2: Normalize Data
Step 3: Train and Test a Model
Step 4: Evaluate Model Performance
Step 5: Improving Model Performance with 5 Hidden Neurons
Step 6: Improving Model using Random Forest Algorithm
Step 7: Testing Further with Resampling
Step 8: Actual Display of a Random Forest Tree Solution
Conclusions
References
Formulation Challenges are Everywhere…
You develop pharmaceutical, cosmetic, food, industrial or civil engineered products, and are often confronted
with the challenge of blending and formulating to meet process or performance properties. While traditional
Research and Development does approach the problem with experimentation, it generally involves designs,
time and resource constraints, and can be considered slow, expensive and often times redundant, fast forgotten
or perhaps obsolete.
Consider the alternative Machine Learning tools offers today. We will show this is not only quick, efficient and
ultimately the only way Front End of Innovation should proceed, and how it is particularly suited for formulation
and classification.
Today, we will explain how Machine Learning can shed new light on this generic and very persistent formulation
challenge. We will discuss the other important aspect of classification and clustering often associated with these
formulations challenges in a forthcoming communication.
Step1: Retrieve Existing Data
To illustrate the approach, we selected a formulation dataset hosted on UCI Machine Learning Repository
(http://archive.ics.uci.edu/ml/datasets.html), to predict the compressive strength (http://archive.ics.uci.edu
/ml/datasets/Concrete+Compressive+Strength) performance dependency on the formulation ingredients. This is
the well-known formulation composition - property relationship scientists, engineers and business professionals
must address daily and any established R&D would certainly have similar and sometimes hidden knowledge in
its archives…
We will use R to demonstrate quickly the approach on this dataset (http://archive.ics.uci.edu/ml/machine-
learning-databases/concrete/compressive/Concrete_Data.xls), and also demonstrate how reproducibility of the
analysis is enforced. The analysis tool and platform are documented, all libraries clearly listed, while data is
retrieved programmatically and date stamped from the repository.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
1 of 11 2/18/2016 5:07 PM
Sys.info()[1:5]
## sysname release version nodename machine
## "Windows" "7 x64" "build 9200" "STALLION" "x86-64"
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] reshape_0.8.5 scales_0.3.0 devtools_1.8.0
## [4] rpart.plot_1.5.3 rpart_4.1-10 randomForest_4.6-10
## [7] neuralnet_1.32 MASS_7.3-44 caret_6.0-52
## [10] ggplot2_1.0.1 lattice_0.20-33 stringr_1.0.0
## [13] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.0 git2r_0.11.0 formatR_1.2
## [4] nloptr_1.0.4 plyr_1.8.3 iterators_1.0.7
## [7] tools_3.2.2 digest_0.6.8 lme4_1.1-9
## [10] memoise_0.2.1 evaluate_0.7.2 gtable_0.1.2
## [13] nlme_3.1-121 mgcv_1.8-9 Matrix_1.2-2
## [16] foreach_1.4.2 curl_0.9.3 parallel_3.2.2
## [19] yaml_2.1.13 SparseM_1.7 brglm_0.5-9
## [22] proto_0.3-10 xml2_0.1.1 BradleyTerry2_1.0-6
## [25] knitr_1.11 rversions_1.0.2 MatrixModels_0.4-1
## [28] gtools_3.5.0 stats4_3.2.2 nnet_7.3-11
## [31] rmarkdown_0.9.2 minqa_1.2.4 reshape2_1.4.1
## [34] car_2.0-26 magrittr_1.5 codetools_0.2-14
## [37] htmltools_0.2.6 splines_3.2.2 pbkrtest_0.4-2
## [40] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5
## [43] munsell_0.4.2
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
2 of 11 2/18/2016 5:07 PM
library(xlsx)
library(stringr)
library(caret)
library(neuralnet)
library(devtools)
library(rpart)
library(rpart.plot)
userdir <- getwd()
datadir <- "./data"
if (!file.exists("data")){dir.create("data")}
fileUrl <- "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compress
ive/Concrete_Data.xls?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/Concrete_Data.xls",method="curl")
dateDownloaded <- date()
concrete <- read.xlsx("./data/Concrete_Data.xls",sheetName="Sheet1")
str(concrete)
## 'data.frame': 1030 obs. of 9 variables:
## $ Cement..component.1..kg.in.a.m.3.mixture. : num 540 540 332 332 199
...
## $ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num 0 0 142 142 132 ...
## $ Fly.Ash..component.3..kg.in.a.m.3.mixture. : num 0 0 0 0 0 0 0 0 0 0
...
## $ Water...component.4..kg.in.a.m.3.mixture. : num 162 162 228 228 192
228 228 228 228 228 ...
## $ Superplasticizer..component.5..kg.in.a.m.3.mixture. : num 2.5 2.5 0 0 0 0 0 0
0 0 ...
## $ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num 1040 1055 932 932 97
8 ...
## $ Fine.Aggregate..component.7..kg.in.a.m.3.mixture. : num 676 676 594 594 826
...
## $ Age..day. : num 28 28 270 365 360 90
365 28 28 28 ...
## $ Concrete.compressive.strength.MPa..megapascals.. : num 80 61.9 40.3 41.1 44
.3 ...
Step 2: Normalize Data
The dataset information reveals 1030 observations with 9 variables: 8 inputs, from which 7 are ingredients and
1 is a process attribute (Age) and 1 output, the strength property. There are no missing values in this set. We’ll
easily truncate the variable names and normalize the data, displaying the normalized strength.
normalize <- function(x) {return((x - min(x)) / (max(x) - min(x)))}
names(concrete)<-gsub("."," ",names(concrete))
names(concrete)<-word(names(concrete),1)
names(concrete)[9]<-"Strength"
concrete_norm <- as.data.frame(lapply(concrete, normalize))
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
3 of 11 2/18/2016 5:07 PM
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2663 0.4000 0.4172 0.5457 1.0000
These transformations should be typical of a generic formulation where ingredients and process variables are
independent or input variables, and property is a dependent or output variable.
Step 3: Train and Test a Model
The method we’ll follow now is a standard approach where we randomly split the data set in a train and test set.
The caret (https://cran.r-project.org/web/packages/caret/caret.pdf) package implements this task well. We’ll use
75% of the data to train and the remainder to test the model. To make the analysis reproducible, we’ll use the
set.seed() function.
set.seed(12121)
inTrain<-createDataPartition(y=concrete_norm$Strength,p=0.75,list=FALSE)
concrete_train <- concrete_norm[inTrain, ]
concrete_test <- concrete_norm[-inTrain, ]
concrete_model <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Superp
lasticizer + Coarse + Fine + Age, data = concrete_train)
The network topology is then easily visualized. See for details the excellent NeuralNetTool page
(https://beckmw.wordpress.com/tag/neural-network/). Suffise to say that even this simple first attempt highlights
the main dependencies and higher impacts are highligted with thicker links in the typical neural net
representation. here the I’s represent inputs, O is the output, H and B are Hidden and Bias nodes as defined in
the theory. Note a single bias node is added for each input and hidden layer to accomodate input features equal
to 0.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
4 of 11 2/18/2016 5:07 PM
Step 4: Evaluate Model Performance
We will now compute predictions and compare them to actual strength and examine the correlation between
predicted and actual strength values.
model_results <- compute(concrete_model, concrete_test[1:8])
predicted_strength <- model_results$net.result
cor(predicted_strength, concrete_test$Strength)[1,1]
## [1] 0.8336352308
The default neural net exhibits a correlation of 0.8336352. We can certainly try to improve it by including a few
hidden neurons…
Step 5: Improving Model Performance with 5
Hidden Neurons
concrete_model2 <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Super
plasticizer + Coarse + Fine + Age, data = concrete_train, hidden=5)
plot.nnet(concrete_model2,cex.val=0.75)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
5 of 11 2/18/2016 5:07 PM
We observe age remains a key contributor, but the effects of Water, Superplasticizer, Cement and Blast are also
visibly ranked. 4 out of the 5 Hidden nodes are about evenly contributing…
model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$Strength)[1,1]
## [1] 0.9138705528
p <- plot(concrete_test$Strength,predicted_strength2)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
6 of 11 2/18/2016 5:07 PM
Step 6: Improving Model using Random Forest
Algorithm
We now will rely on the Random Forest algorithm and attempt a model improvement.
model_result3 <- train(Strength ~ ., data = concrete_train,method='rf',prox=TRUE)
model_result3
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
7 of 11 2/18/2016 5:07 PM
## Random Forest
##
## 774 samples
## 8 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 774, 774, 774, 774, 774, 774, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared RMSE SD Rsquared SD
## 2 0.07833903729 0.8774837294 0.004515965257 0.01607292866
## 5 0.07159729020 0.8869237640 0.004510365293 0.01395064039
## 8 0.07365915696 0.8777421774 0.005197103383 0.01691711063
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 5.
predicted_strength3 <- predict(model_result3,concrete_test)
cor(predicted_strength3, concrete_test$Strength)
## [1] 0.943384811
The default Random Forest algorithm helped improve our prediction and exhibits a correlation of 0.9433848.
Again, we can certainly try to improve by introducing resampling… The caret package offers multiple methods
to try out. We’ll just try one to give an idea…
Step 7: Testing Further with Resampling
model_result4 <- train(Strength ~ ., method='rf',data = concrete_train,verbose=FALSE,t
rControl = trainControl(method="cv"))
model_result4
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
8 of 11 2/18/2016 5:07 PM
## Random Forest
##
## 774 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 696, 696, 695, 697, 698, 696, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared RMSE SD Rsquared SD
## 2 0.06910599 0.9068129 0.01101706 0.04013275
## 5 0.06227778 0.9121046 0.01232558 0.03945448
## 8 0.06287753 0.9089120 0.01208088 0.03835530
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 5.
predicted_strength4 <- predict(model_result4,concrete_test)
cor(predicted_strength4, concrete_test$Strength)
## [1] 0.9428245
We observe the prediction is practically unchanged in this case, with a correlation of 0.9428245. Still not bad for
a quick analysis performed on existing data. Regardless of our property target, we already derived key areas to
investigate deeper… and can clearly see some key ingredients (Cement, Blast, Fly, Superplasticizer, Water…)
and the Age process as determining factors to produce strength performance. So naturally, one may want to
display this model.
Step 8: Actual Display of a Random Forest Tree
Solution
It turns out that so-called blackbox models are – well – meant to stay in their box! However, the rpart
(https://cran.r-project.org/web/packages/rpart/rpart.pdf) and rpart.plot (https://cran.r-project.org/web/packages
/rpart.plot/rpart.plot.pdf) packages make it easy to visualize even complex trees.
strength.tree <- rpart(Strength ~ .,data=concrete_train, control=rpart.control(minspli
t=20,cp=0.002))
prp(strength.tree,compress=TRUE)
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
9 of 11 2/18/2016 5:07 PM
In the network, normalized strength is indicated in the oval leaves, and are ranked from low to high from left to
right branches.
Conclusions
We hope this typical example demonstrates that Machine Learning algorithms are well positioned to help
resolve formulation challenges, offering a fast, efficient and economical alternative to tedious experimentation. It
is easy to imagine how similar questions can be resolved in all types of R&D, in materials, cosmetics, food or
any scientific area.
Rubber formulations to minimize rolling resistance and emissions, or modern composites to build renewable
energy sources or lighweight transportation vehicles and next-generation public transit, as well as innovative
UV-shield oinments and tasty snacks and drinks…, all present similar challenges where only the nature of
inputs and outputs vary. Therefore, the method can and should be applied broadly!
Next time, we’ll review how to address another common challenge: classification and clustering. Till then, we
hope this approach has triggered interest.
Why not try and implement Machine Learning in your scientific or technical expert area? Remember, PRC
Consulting, LLC (http://www.prcconsulting.net) is dedicated to boosting innovation thru improved Analytics, one
customer at the time!
References
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
10 of 11 2/18/2016 5:07 PM
The following sources are referenced as they provided significant help and information to develop this Machine
Learning analysis applied to formulations:
UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html)1.
caret (https://cran.r-project.org/web/packages/caret/caret.pdf)2.
NeuralNetTool (https://beckmw.wordpress.com/tag/neural-network/)3.
rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf)4.
rpart.plot (https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf)5.
RStudio (https://www.rstudio.com)6.
Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html
11 of 11 2/18/2016 5:07 PM

More Related Content

Viewers also liked

Shyrley n°15 5-c
Shyrley n°15 5-cShyrley n°15 5-c
Inflation
InflationInflation
Inflation
Online
 
Presentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_cPresentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_c
Institución Educativa "Uriel García" del Cusco
 
Deus de promessas
Deus de promessasDeus de promessas
Deus de promessasLucas Paula
 
Influencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_cInfluencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_c
Institución Educativa "Uriel García" del Cusco
 
Blog
BlogBlog
Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5Souad Azizi
 
Tax Incidence Webinar slides
Tax Incidence Webinar slidesTax Incidence Webinar slides
Tax Incidence Webinar slides
Chiang Mai University School of Public Policy
 
Historical Foundations of Management
Historical Foundations of ManagementHistorical Foundations of Management
Historical Foundations of ManagementLeigh Canvas
 
Qué es un dominio de Internet
Qué es un dominio de InternetQué es un dominio de Internet
Qué es un dominio de Internet
Hispanic Chamber of E-Commerce
 
Incidence of tax
Incidence of taxIncidence of tax
Incidence of tax
rk16588
 

Viewers also liked (13)

Shyrley n°15 5-c
Shyrley n°15 5-cShyrley n°15 5-c
Shyrley n°15 5-c
 
Inflation
InflationInflation
Inflation
 
Presentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_cPresentacion multimedia claudia hilares_5_c
Presentacion multimedia claudia hilares_5_c
 
Deus de promessas
Deus de promessasDeus de promessas
Deus de promessas
 
Influencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_cInfluencia del inetrnet en la sociedad silvia cardenas 5_c
Influencia del inetrnet en la sociedad silvia cardenas 5_c
 
Blog
BlogBlog
Blog
 
Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5Azizi fonctions des icones sous spss11.5
Azizi fonctions des icones sous spss11.5
 
Tax Incidence Webinar slides
Tax Incidence Webinar slidesTax Incidence Webinar slides
Tax Incidence Webinar slides
 
Historical Foundations of Management
Historical Foundations of ManagementHistorical Foundations of Management
Historical Foundations of Management
 
Qué es un dominio de Internet
Qué es un dominio de InternetQué es un dominio de Internet
Qué es un dominio de Internet
 
Incidence of tax
Incidence of taxIncidence of tax
Incidence of tax
 
Master degree Safety Resume
Master degree Safety ResumeMaster degree Safety Resume
Master degree Safety Resume
 
101 lecture 6
101 lecture 6101 lecture 6
101 lecture 6
 

Similar to Machine learning key to your formulation challenges

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftSteve Feldman
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Frederic Descamps
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET Journal
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
Frederic Descamps
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
MongoDB
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Joachim Schlosser
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
DataScienceConferenc1
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Valeriy Kravchuk
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
Renjith M P
 
Ember
EmberEmber
Ember
mrphilroth
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
Doug Needham
 
Databricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfDatabricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdf
SkillCertProExams
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
Testing Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation FrameworksTesting Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation Frameworks
Łukasz Morawski
 

Similar to Machine learning key to your formulation challenges (20)

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Ember
EmberEmber
Ember
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Databricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfDatabricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdf
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Testing Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation FrameworksTesting Experience - Evolution of Test Automation Frameworks
Testing Experience - Evolution of Test Automation Frameworks
 

Recently uploaded

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Machine learning key to your formulation challenges

  • 1. Machine Learning, Key to Your Formulation Challenges Marc Borowczak, PRC Consulting LLC (http://www.prcconsulting.net) February 17, 2016 Formulation Challenges are Everywhere… Step1: Retrieve Existing Data Step 2: Normalize Data Step 3: Train and Test a Model Step 4: Evaluate Model Performance Step 5: Improving Model Performance with 5 Hidden Neurons Step 6: Improving Model using Random Forest Algorithm Step 7: Testing Further with Resampling Step 8: Actual Display of a Random Forest Tree Solution Conclusions References Formulation Challenges are Everywhere… You develop pharmaceutical, cosmetic, food, industrial or civil engineered products, and are often confronted with the challenge of blending and formulating to meet process or performance properties. While traditional Research and Development does approach the problem with experimentation, it generally involves designs, time and resource constraints, and can be considered slow, expensive and often times redundant, fast forgotten or perhaps obsolete. Consider the alternative Machine Learning tools offers today. We will show this is not only quick, efficient and ultimately the only way Front End of Innovation should proceed, and how it is particularly suited for formulation and classification. Today, we will explain how Machine Learning can shed new light on this generic and very persistent formulation challenge. We will discuss the other important aspect of classification and clustering often associated with these formulations challenges in a forthcoming communication. Step1: Retrieve Existing Data To illustrate the approach, we selected a formulation dataset hosted on UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html), to predict the compressive strength (http://archive.ics.uci.edu /ml/datasets/Concrete+Compressive+Strength) performance dependency on the formulation ingredients. This is the well-known formulation composition - property relationship scientists, engineers and business professionals must address daily and any established R&D would certainly have similar and sometimes hidden knowledge in its archives… We will use R to demonstrate quickly the approach on this dataset (http://archive.ics.uci.edu/ml/machine- learning-databases/concrete/compressive/Concrete_Data.xls), and also demonstrate how reproducibility of the analysis is enforced. The analysis tool and platform are documented, all libraries clearly listed, while data is retrieved programmatically and date stamped from the repository. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 1 of 11 2/18/2016 5:07 PM
  • 2. Sys.info()[1:5] ## sysname release version nodename machine ## "Windows" "7 x64" "build 9200" "STALLION" "x86-64" sessionInfo() ## R version 3.2.2 (2015-08-14) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 8 x64 (build 9200) ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] grid stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] reshape_0.8.5 scales_0.3.0 devtools_1.8.0 ## [4] rpart.plot_1.5.3 rpart_4.1-10 randomForest_4.6-10 ## [7] neuralnet_1.32 MASS_7.3-44 caret_6.0-52 ## [10] ggplot2_1.0.1 lattice_0.20-33 stringr_1.0.0 ## [13] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_0.12.0 git2r_0.11.0 formatR_1.2 ## [4] nloptr_1.0.4 plyr_1.8.3 iterators_1.0.7 ## [7] tools_3.2.2 digest_0.6.8 lme4_1.1-9 ## [10] memoise_0.2.1 evaluate_0.7.2 gtable_0.1.2 ## [13] nlme_3.1-121 mgcv_1.8-9 Matrix_1.2-2 ## [16] foreach_1.4.2 curl_0.9.3 parallel_3.2.2 ## [19] yaml_2.1.13 SparseM_1.7 brglm_0.5-9 ## [22] proto_0.3-10 xml2_0.1.1 BradleyTerry2_1.0-6 ## [25] knitr_1.11 rversions_1.0.2 MatrixModels_0.4-1 ## [28] gtools_3.5.0 stats4_3.2.2 nnet_7.3-11 ## [31] rmarkdown_0.9.2 minqa_1.2.4 reshape2_1.4.1 ## [34] car_2.0-26 magrittr_1.5 codetools_0.2-14 ## [37] htmltools_0.2.6 splines_3.2.2 pbkrtest_0.4-2 ## [40] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5 ## [43] munsell_0.4.2 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 2 of 11 2/18/2016 5:07 PM
  • 3. library(xlsx) library(stringr) library(caret) library(neuralnet) library(devtools) library(rpart) library(rpart.plot) userdir <- getwd() datadir <- "./data" if (!file.exists("data")){dir.create("data")} fileUrl <- "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compress ive/Concrete_Data.xls?accessType=DOWNLOAD" download.file(fileUrl,destfile="./data/Concrete_Data.xls",method="curl") dateDownloaded <- date() concrete <- read.xlsx("./data/Concrete_Data.xls",sheetName="Sheet1") str(concrete) ## 'data.frame': 1030 obs. of 9 variables: ## $ Cement..component.1..kg.in.a.m.3.mixture. : num 540 540 332 332 199 ... ## $ Blast.Furnace.Slag..component.2..kg.in.a.m.3.mixture.: num 0 0 142 142 132 ... ## $ Fly.Ash..component.3..kg.in.a.m.3.mixture. : num 0 0 0 0 0 0 0 0 0 0 ... ## $ Water...component.4..kg.in.a.m.3.mixture. : num 162 162 228 228 192 228 228 228 228 228 ... ## $ Superplasticizer..component.5..kg.in.a.m.3.mixture. : num 2.5 2.5 0 0 0 0 0 0 0 0 ... ## $ Coarse.Aggregate...component.6..kg.in.a.m.3.mixture. : num 1040 1055 932 932 97 8 ... ## $ Fine.Aggregate..component.7..kg.in.a.m.3.mixture. : num 676 676 594 594 826 ... ## $ Age..day. : num 28 28 270 365 360 90 365 28 28 28 ... ## $ Concrete.compressive.strength.MPa..megapascals.. : num 80 61.9 40.3 41.1 44 .3 ... Step 2: Normalize Data The dataset information reveals 1030 observations with 9 variables: 8 inputs, from which 7 are ingredients and 1 is a process attribute (Age) and 1 output, the strength property. There are no missing values in this set. We’ll easily truncate the variable names and normalize the data, displaying the normalized strength. normalize <- function(x) {return((x - min(x)) / (max(x) - min(x)))} names(concrete)<-gsub("."," ",names(concrete)) names(concrete)<-word(names(concrete),1) names(concrete)[9]<-"Strength" concrete_norm <- as.data.frame(lapply(concrete, normalize)) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 3 of 11 2/18/2016 5:07 PM
  • 4. ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0000 0.2663 0.4000 0.4172 0.5457 1.0000 These transformations should be typical of a generic formulation where ingredients and process variables are independent or input variables, and property is a dependent or output variable. Step 3: Train and Test a Model The method we’ll follow now is a standard approach where we randomly split the data set in a train and test set. The caret (https://cran.r-project.org/web/packages/caret/caret.pdf) package implements this task well. We’ll use 75% of the data to train and the remainder to test the model. To make the analysis reproducible, we’ll use the set.seed() function. set.seed(12121) inTrain<-createDataPartition(y=concrete_norm$Strength,p=0.75,list=FALSE) concrete_train <- concrete_norm[inTrain, ] concrete_test <- concrete_norm[-inTrain, ] concrete_model <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Superp lasticizer + Coarse + Fine + Age, data = concrete_train) The network topology is then easily visualized. See for details the excellent NeuralNetTool page (https://beckmw.wordpress.com/tag/neural-network/). Suffise to say that even this simple first attempt highlights the main dependencies and higher impacts are highligted with thicker links in the typical neural net representation. here the I’s represent inputs, O is the output, H and B are Hidden and Bias nodes as defined in the theory. Note a single bias node is added for each input and hidden layer to accomodate input features equal to 0. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 4 of 11 2/18/2016 5:07 PM
  • 5. Step 4: Evaluate Model Performance We will now compute predictions and compare them to actual strength and examine the correlation between predicted and actual strength values. model_results <- compute(concrete_model, concrete_test[1:8]) predicted_strength <- model_results$net.result cor(predicted_strength, concrete_test$Strength)[1,1] ## [1] 0.8336352308 The default neural net exhibits a correlation of 0.8336352. We can certainly try to improve it by including a few hidden neurons… Step 5: Improving Model Performance with 5 Hidden Neurons concrete_model2 <- neuralnet(formula = Strength ~ Cement + Blast + Fly + Water + Super plasticizer + Coarse + Fine + Age, data = concrete_train, hidden=5) plot.nnet(concrete_model2,cex.val=0.75) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 5 of 11 2/18/2016 5:07 PM
  • 6. We observe age remains a key contributor, but the effects of Water, Superplasticizer, Cement and Blast are also visibly ranked. 4 out of the 5 Hidden nodes are about evenly contributing… model_results2 <- compute(concrete_model2, concrete_test[1:8]) predicted_strength2 <- model_results2$net.result cor(predicted_strength2, concrete_test$Strength)[1,1] ## [1] 0.9138705528 p <- plot(concrete_test$Strength,predicted_strength2) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 6 of 11 2/18/2016 5:07 PM
  • 7. Step 6: Improving Model using Random Forest Algorithm We now will rely on the Random Forest algorithm and attempt a model improvement. model_result3 <- train(Strength ~ ., data = concrete_train,method='rf',prox=TRUE) model_result3 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 7 of 11 2/18/2016 5:07 PM
  • 8. ## Random Forest ## ## 774 samples ## 8 predictor ## ## No pre-processing ## Resampling: Bootstrapped (25 reps) ## Summary of sample sizes: 774, 774, 774, 774, 774, 774, ... ## Resampling results across tuning parameters: ## ## mtry RMSE Rsquared RMSE SD Rsquared SD ## 2 0.07833903729 0.8774837294 0.004515965257 0.01607292866 ## 5 0.07159729020 0.8869237640 0.004510365293 0.01395064039 ## 8 0.07365915696 0.8777421774 0.005197103383 0.01691711063 ## ## RMSE was used to select the optimal model using the smallest value. ## The final value used for the model was mtry = 5. predicted_strength3 <- predict(model_result3,concrete_test) cor(predicted_strength3, concrete_test$Strength) ## [1] 0.943384811 The default Random Forest algorithm helped improve our prediction and exhibits a correlation of 0.9433848. Again, we can certainly try to improve by introducing resampling… The caret package offers multiple methods to try out. We’ll just try one to give an idea… Step 7: Testing Further with Resampling model_result4 <- train(Strength ~ ., method='rf',data = concrete_train,verbose=FALSE,t rControl = trainControl(method="cv")) model_result4 Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 8 of 11 2/18/2016 5:07 PM
  • 9. ## Random Forest ## ## 774 samples ## 8 predictor ## ## No pre-processing ## Resampling: Cross-Validated (10 fold) ## Summary of sample sizes: 696, 696, 695, 697, 698, 696, ... ## Resampling results across tuning parameters: ## ## mtry RMSE Rsquared RMSE SD Rsquared SD ## 2 0.06910599 0.9068129 0.01101706 0.04013275 ## 5 0.06227778 0.9121046 0.01232558 0.03945448 ## 8 0.06287753 0.9089120 0.01208088 0.03835530 ## ## RMSE was used to select the optimal model using the smallest value. ## The final value used for the model was mtry = 5. predicted_strength4 <- predict(model_result4,concrete_test) cor(predicted_strength4, concrete_test$Strength) ## [1] 0.9428245 We observe the prediction is practically unchanged in this case, with a correlation of 0.9428245. Still not bad for a quick analysis performed on existing data. Regardless of our property target, we already derived key areas to investigate deeper… and can clearly see some key ingredients (Cement, Blast, Fly, Superplasticizer, Water…) and the Age process as determining factors to produce strength performance. So naturally, one may want to display this model. Step 8: Actual Display of a Random Forest Tree Solution It turns out that so-called blackbox models are – well – meant to stay in their box! However, the rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf) and rpart.plot (https://cran.r-project.org/web/packages /rpart.plot/rpart.plot.pdf) packages make it easy to visualize even complex trees. strength.tree <- rpart(Strength ~ .,data=concrete_train, control=rpart.control(minspli t=20,cp=0.002)) prp(strength.tree,compress=TRUE) Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 9 of 11 2/18/2016 5:07 PM
  • 10. In the network, normalized strength is indicated in the oval leaves, and are ranked from low to high from left to right branches. Conclusions We hope this typical example demonstrates that Machine Learning algorithms are well positioned to help resolve formulation challenges, offering a fast, efficient and economical alternative to tedious experimentation. It is easy to imagine how similar questions can be resolved in all types of R&D, in materials, cosmetics, food or any scientific area. Rubber formulations to minimize rolling resistance and emissions, or modern composites to build renewable energy sources or lighweight transportation vehicles and next-generation public transit, as well as innovative UV-shield oinments and tasty snacks and drinks…, all present similar challenges where only the nature of inputs and outputs vary. Therefore, the method can and should be applied broadly! Next time, we’ll review how to address another common challenge: classification and clustering. Till then, we hope this approach has triggered interest. Why not try and implement Machine Learning in your scientific or technical expert area? Remember, PRC Consulting, LLC (http://www.prcconsulting.net) is dedicated to boosting innovation thru improved Analytics, one customer at the time! References Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 10 of 11 2/18/2016 5:07 PM
  • 11. The following sources are referenced as they provided significant help and information to develop this Machine Learning analysis applied to formulations: UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.html)1. caret (https://cran.r-project.org/web/packages/caret/caret.pdf)2. NeuralNetTool (https://beckmw.wordpress.com/tag/neural-network/)3. rpart (https://cran.r-project.org/web/packages/rpart/rpart.pdf)4. rpart.plot (https://cran.r-project.org/web/packages/rpart.plot/rpart.plot.pdf)5. RStudio (https://www.rstudio.com)6. Machine Learning, Key to Your Formulation Challenges file:///P:/MachineLearningExamples/Machine_Learning_Formulation.html 11 of 11 2/18/2016 5:07 PM