SlideShare a Scribd company logo
1 of 36
Download to read offline
Dataminingwithcaretpackage
Kai Xiao and Vivian Zhang @Supstat Inc.
Outline
Introduction of data mining and caret
before model training
building model
advance topic
exercise
·
·
visualization
pre-processing
Data slitting
-
-
-
·
Model training and Tuning
Model performance
variable importance
-
-
-
·
feature selection
parallel processing
-
-
·
/
cross-industry standard process for data mining
/
Introduction of caret
The caret package (short for Classification And REgression Training) is a set of functions that
attempt to streamline the process for creating predictive models. The package contains tools for:
data splitting
pre-processing
feature selection
model tuning using resampling
variable importance estimation
·
·
·
·
·
/
A very simple example
library(caret)
str(iris)
set.seed(1)
#preprocess
process<-preProcess(iris[,-5],method=c('center','scale'))
dataScaled<-predict(process,iris[,-5])
#datasplitting
inTrain<-createDataPartition(iris$Species,p=0.75)[[1]]
length(inTrain)
trainData<-dataScaled[inTrain,]
trainClass<-iris[inTrain,5]
testData<-dataScaled[-inTrain,]
testClass<-iris[-inTrain,5]
/
A very simple example
#modeltuning
set.seed(1)
fitControl<-trainControl(method="cv",
number=10)
tunedf<- data.frame(.cp=c(0.01,0.05,0.1,0.3,0.5))
treemodel<-train(x=trainData,
y=trainClass,
method='rpart',
trControl=fitControl,
tuneGrid=tunedf)
print(treemodel)
plot(treemodel)
#predictionandperformanceassessment
treePred<-predict(treemodel,testData)
confusionMatrix(treePred,testClass)
/
visualizations
The featurePlot function is a wrapper for different lattice plots to visualize the data.
Scatterplot Matrix
boxplot
featurePlot(x=iris[,1:4],
y=iris$Species,
plot="pairs",
##Addakeyatthetop
auto.key=list(columns=3))
featurePlot(x=iris[,1:4],
y=iris$Species,
plot="box",
##Addakeyatthetop
auto.key=list(columns=3))
/
pre-processing
Creating Dummy Variables
when<-data.frame(time=c("afternoon","night","afternoon",
"morning","morning","morning",
"morning","afternoon","afternoon"))
when
levels(when$time)<-c("morning","afternoon","night")
mainEffects<-dummyVars(~time,data=when)
predict(mainEffects,when)
/
pre-processing
Zero- and Near Zero-Variance Predictors
data<-data.frame(x1=rnorm(100),
x2=runif(100),
x3=rep(c(0,1),times=c(2,98)),
x4=rep(3,length=100))
nzv<-nearZeroVar(data,saveMetrics=TRUE)
nzv
nzv<-nearZeroVar(data)
dataFilted<-data[,-nzv]
head(dataFilted)
/
pre-processing
Identifying Correlated Predictors
set.seed(1)
x1<-rnorm(100)
x2<-x1+rnorm(100,0.1,0.1)
x3<-x1+rnorm(100,1,1)
data<-data.frame(x1,x2,x3)
corrmatrix<-cor(data)
highlyCor<-findCorrelation(corrmatrix,cutoff=0.75)
dataFilted<-data[,-highlyCor]
head(dataFilted)
/
pre-processing
Identifying Linear Dependencies Predictors
set.seed(1)
x1<-rnorm(100)
x2<-x1+rnorm(100,0.1,0.1)
x3<-x1+rnorm(100,1,1)
x4<-x2+x3
data<-data.frame(x1,x2,x3,x4)
comboInfo<-findLinearCombos(data)
dataFilted<-data[,-comboInfo$remove]
head(dataFilted)
/
pre-processing
Centering and Scaling
set.seed(1)
x1<-rnorm(100)
x2<-3+3*x1+rnorm(100)
x3<-2+2*x1+rnorm(100)
data<-data.frame(x1,x2,x3)
summary(data)
preProc<-preProcess(data,method=c("center","scale"))
dataProced<-predict(preProc,data)
summary(dataProced)
/
pre-processing
Imputation:bagImpute/knnImpute/
data<-iris[,-5]
data[1,2]<-NA
data[2,1]<-NA
impu<-preProcess(data,method='knnImpute')
dataProced<-predict(impu,data)
/
pre-processing
transformation: BoxCox/PCA
data<-iris[,-5]
pcaProc<-preProcess(data,method='pca')
dataProced<-predict(pcaProc,data)
head(dataProced)
/
data splitting
create balanced splits of the data
set.seed(1)
trainIndex<-createDataPartition(iris$Species,p=0.8,list=FALSE, times=1)
head(trainIndex)
irisTrain<-iris[trainIndex,]
irisTest<-iris[-trainIndex,]
summary(irisTest$Species)
createResample can be used to make simple bootstrap samples
createFolds can be used to generate balanced cross–validation groupings from a set of data.
·
·
/
Model Training and Parameter Tuning
The train function can be used to
evaluate, using resampling, the effect of model tuning parameters on performance
choose the "optimal" model across these parameters
estimate model performance from a training set
·
·
·
/
Model Training and Parameter Tuning
prepare data
data(PimaIndiansDiabetes2,package='mlbench')
data<-PimaIndiansDiabetes2
library(caret)
#scaleandcenter
preProcValues<-preProcess(data[,-9],method=c("center","scale"))
scaleddata<-predict(preProcValues,data[,-9])
#YeoJohnsontransformation
preProcbox<-preProcess(scaleddata,method=c("YeoJohnson"))
boxdata<-predict(preProcbox,scaleddata)
/
Model Training and Parameter Tuning
prepare data
#bagimpute
preProcimp<-preProcess(boxdata,method="bagImpute")
procdata<-predict(preProcimp,boxdata)
procdata$class<-data[,9]
#datasplitting
inTrain<-createDataPartition(procdata$class,p=0.75)[[1]]
length(inTrain)
trainData<-procdata[inTrain,1:8]
trainClass<-procdata[inTrain,9]
testData<-procdata[-inTrain,1:8]
testClass<-procdata[-inTrain,9]
/
Model Training and Parameter Tuning
define sets of model parameter values to evaluate
tunedf<- data.frame(.cp=seq(0.001,0.2,length.out=10))
/
Model Training and Parameter Tuning
define the type of resampling method
k-fold cross-validation (once or repeated)
leave-one-out cross-validation
bootstrap (simple estimation or the 632 rule)
·
·
·
fitControl<-trainControl(method="repeatedcv",
#10-foldcrossvalidation
number=10,
#repeated3times
repeats=3)
/
Model Training and Parameter Tuning
start training
treemodel<-train(x=trainData,
y=trainClass,
method='rpart',
trControl=fitControl,
tuneGrid=tunedf)
/
Model Training and Parameter Tuning
look at the final result
treemodel
plot(treemodel)
/
The trainControl Function
method: The resampling method
number and repeats: number controls with the number of folds in K-fold cross-validation or
number of resampling iterations for bootstrapping and leave-group-out cross-validation.
verboseIter: A logical for printing a training log.
returnData: A logical for saving the data into a slot called trainingData.
classProbs: a logical value determining whether class probabilities should be computed for held-
out samples during resample.
summaryFunction: a function to compute alternate performance summaries.
selectionFunction: a function to choose the optimal tuning parameters.
returnResamp: a character string containing one of the following values: "all", "final" or "none".
This specifies how much of the resampled performance measures to save.
·
·
·
·
·
·
·
·
/
Alternate Performance Metrics
Performance Metrics:
Another built-in function, twoClassSummary, will compute the sensitivity, specificity and area under
the ROC curve
regression: RMSE and R2
classification: accuracy and Kappa
·
·
fitControl<-trainControl(method="repeatedcv",
number=10,
repeats=3,
classProbs=TRUE,
summaryFunction=twoClassSummary)
treemodel<-train(x=trainData,
y=trainClass,
method='rpart',
trControl=fitControl,
tuneGrid=tunedf,
metric="ROC")
treemodel
/
Extracting Predictions
Predictions can be made from these objects as usual.
pre<-predict(treemodel,testData)
pre<-predict(treemodel,testData,type="prob")
/
Evaluating Test Sets
caret also contains several functions that can be used to describe the performance of classification
models
testPred<-predict(treemodel,testData)
testPred.prob<-predict(treemodel,testData,type='prob')
postResample(testPred,testClass)
confusionMatrix(testPred,testClass)
/
Exploring and Comparing Resampling
Distributions
Within-Model Comparing·
densityplot(treemodel,pch="|")
/
Exploring and Comparing Resampling
Distributions
Between-Models Comparing
let's build a nnet model, and compare these two model performance
·
·
tunedf<-expand.grid(.decay=0.1,
.size=1:8,
.bag=T)
nnetmodel<-train(x=trainData,
y=trainClass,
method='avNNet',
trControl=fitControl,
trace=F,
linout=F,
metric="ROC",
tuneGrid=tunedf)
nnetmodel
/
Exploring and Comparing Resampling
Distributions
Given these models, can we make statistical statements about their performance differences? To do
this, we first collect the resampling results using resamples.
We can compute the differences, then use a simple t-test to evaluate the null hypothesis that there is
no difference between models.
resamps<-resamples(list(tree=treemodel,
nnet=nnetmodel))
bwplot(resamps)
densityplot(resamps,metric='ROC')
difValues<-diff(resamps)
summary(difValues)
/
Variable importance evaluation
Variable importance evaluation functions can be separated into two groups:
model-based approach
Model Independent approach
·
·
For classification, ROC curve analysis is conducted on each predictor.
For regression, the relationship between each predictor and the outcome is evaluated
-
-
#model-basedapproach
treeimp<-varImp(treemodel)
plot(treeimp)
#ModelIndependentapproach
RocImp<-varImp(treemodel,useModel=FALSE)
plot(RocImp)
#or
RocImp<-filterVarImp(x=trainData,y=trainClass)
plot(RocImp)
/
feature selection
Many models do not necessarily use all the predictors
Feature Selection Using Search Algorithms("wrapper" approach)
Feature Selection Using Univariate Filters('filter' approach)
·
·
·
/
feature selection: wrapper approach
/
feature selection: wrapper approach
feature selection based on random forest model
pre-defined sets of functions: linear regression(lmFuncs), random forests (rfFuncs), naive Bayes
(nbFuncs), bagged trees (treebagFuncs)
ctrl<-rfeControl(functions=rfFuncs,
method="repeatedcv",
number=10,
repeats=3,
verbose=FALSE,
returnResamp="final")
Profile<-rfe(x=trainData,
y=trainClass,
sizes=1:8,
rfeControl=ctrl)
Profile
/
feature selection: wrapper approach
feature selection based on custom model
tunedf<- data.frame(.cp=seq(0.001,0.2,length.out=5))
fitControl<-trainControl(method="repeatedcv",
number=10,
repeats=3,
classProbs=TRUE,
summaryFunction=twoClassSummary)
customFuncs<-caretFuncs
customFuncs$summary<-twoClassSummary
ctrl<-rfeControl(functions=customFuncs,
method="repeatedcv",
number=10,
repeats=3,
verbose=FALSE,
returnResamp="final")
Profile<-rfe(x=trainData,
y=trainClass,
sizes=1:8,
method='rpart',
rfeControl=ctrl, /
parallel processing
system.time({
library(doParallel)
registerDoParallel(cores=2)
nnetmodel.para<-train(x=trainData,
y=trainClass,
method='avNNet',
trControl=fitControl,
trace=F,
linout=F,
metric="ROC",
tuneGrid=tunedf)
})
nnetmodel$times
nnetmodel.para$times
/
exercise-1
use knn method to train model
library(caret)
fitControl<-trainControl(method="repeatedcv",
number=10,
repeats=3)
tunedf<-data.frame(.k=seq(3,20,by=2))
knnmodel<-train(x=trainData,
y=trainClass,
method='knn',
trControl=fitControl,
tuneGrid=tunedf)
plot(knnmodel)
/

More Related Content

What's hot

5. surat keterangan aktif melaksanakan tugas
5. surat keterangan aktif melaksanakan tugas5. surat keterangan aktif melaksanakan tugas
5. surat keterangan aktif melaksanakan tugasWarnet Raha
 
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...MarTech Conference
 
MATERI PERKA BKN 3 2023.pptx
MATERI PERKA BKN 3 2023.pptxMATERI PERKA BKN 3 2023.pptx
MATERI PERKA BKN 3 2023.pptxsubbidpjf2022
 
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteranKutsiyatinMSi
 
Panduan Penilaian untuk SMK
Panduan Penilaian untuk SMKPanduan Penilaian untuk SMK
Panduan Penilaian untuk SMKAKHMAD SUDRAJAT
 
Judicial measures of lord cornwallis
Judicial measures of lord cornwallisJudicial measures of lord cornwallis
Judicial measures of lord cornwallisVishwadeep Sharma
 

What's hot (8)

6.buku prakerin
6.buku prakerin6.buku prakerin
6.buku prakerin
 
5. surat keterangan aktif melaksanakan tugas
5. surat keterangan aktif melaksanakan tugas5. surat keterangan aktif melaksanakan tugas
5. surat keterangan aktif melaksanakan tugas
 
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...
CX Appeal: Technology to Keep Your Customers Coming Back for More By Gerry Mu...
 
MATERI PERKA BKN 3 2023.pptx
MATERI PERKA BKN 3 2023.pptxMATERI PERKA BKN 3 2023.pptx
MATERI PERKA BKN 3 2023.pptx
 
Analisis Jabatan
Analisis JabatanAnalisis Jabatan
Analisis Jabatan
 
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran
20210412 01 devi_anantha_asdep_manajemen_kinerja_dan_kesejahteran
 
Panduan Penilaian untuk SMK
Panduan Penilaian untuk SMKPanduan Penilaian untuk SMK
Panduan Penilaian untuk SMK
 
Judicial measures of lord cornwallis
Judicial measures of lord cornwallisJudicial measures of lord cornwallis
Judicial measures of lord cornwallis
 

Viewers also liked

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rVivian S. Zhang
 
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow Vivian S. Zhang
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningVivian S. Zhang
 
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
 Hack session for NYTimes Dialect Map Visualization( developed by R Shiny) Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)Vivian S. Zhang
 
Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataVivian S. Zhang
 
A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data Vivian S. Zhang
 
We're so skewed_presentation
We're so skewed_presentationWe're so skewed_presentation
We're so skewed_presentationVivian S. Zhang
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 

Viewers also liked (14)

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
 
Bayesian models in r
Bayesian models in rBayesian models in r
Bayesian models in r
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learning
 
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
 Hack session for NYTimes Dialect Map Visualization( developed by R Shiny) Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
 
Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big Data
 
A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data
 
We're so skewed_presentation
We're so skewed_presentationWe're so skewed_presentation
We're so skewed_presentation
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York Times
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Xgboost
XgboostXgboost
Xgboost
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 

Similar to Data mining with caret package

Grid search.pptx
Grid search.pptxGrid search.pptx
Grid search.pptxAbithaSam
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Raman Kannan
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and ClassificationBrigitte Mueller
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)TarunPaparaju
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3Max Kleiner
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptxssuser31398b
 
Common Design for Distributed Machine Learning
Common Design for Distributed Machine LearningCommon Design for Distributed Machine Learning
Common Design for Distributed Machine LearningJunyoung Park
 
Testing in those hard to reach places
Testing in those hard to reach placesTesting in those hard to reach places
Testing in those hard to reach placesdn
 
Classification examp
Classification exampClassification examp
Classification exampRyan Hong
 

Similar to Data mining with caret package (20)

Grid search.pptx
Grid search.pptxGrid search.pptx
Grid search.pptx
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
 
R console
R consoleR console
R console
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Aspects of 10 Tuning
Aspects of 10 TuningAspects of 10 Tuning
Aspects of 10 Tuning
 
Chapter15
Chapter15Chapter15
Chapter15
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
 
BPstudy sklearn 20180925
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 
Common Design for Distributed Machine Learning
Common Design for Distributed Machine LearningCommon Design for Distributed Machine Learning
Common Design for Distributed Machine Learning
 
Customer analytics for e commerce
Customer analytics for e commerceCustomer analytics for e commerce
Customer analytics for e commerce
 
Testing in those hard to reach places
Testing in those hard to reach placesTesting in those hard to reach places
Testing in those hard to reach places
 
Classification examp
Classification exampClassification examp
Classification examp
 

More from Vivian S. Zhang

Career services workshop- Roger Ren
Career services workshop- Roger RenCareer services workshop- Roger Ren
Career services workshop- Roger RenVivian S. Zhang
 
Nycdsa wordpress guide book
Nycdsa wordpress guide bookNycdsa wordpress guide book
Nycdsa wordpress guide bookVivian S. Zhang
 
Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Vivian S. Zhang
 
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataTHE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataVivian S. Zhang
 
Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Vivian S. Zhang
 
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Vivian S. Zhang
 
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycData Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycVivian S. Zhang
 
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycData Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycVivian S. Zhang
 
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Vivian S. Zhang
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Vivian S. Zhang
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Vivian S. Zhang
 
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Vivian S. Zhang
 
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...Vivian S. Zhang
 
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...Vivian S. Zhang
 

More from Vivian S. Zhang (16)

Why NYC DSA.pdf
Why NYC DSA.pdfWhy NYC DSA.pdf
Why NYC DSA.pdf
 
Career services workshop- Roger Ren
Career services workshop- Roger RenCareer services workshop- Roger Ren
Career services workshop- Roger Ren
 
Nycdsa wordpress guide book
Nycdsa wordpress guide bookNycdsa wordpress guide book
Nycdsa wordpress guide book
 
Xgboost
XgboostXgboost
Xgboost
 
Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015
 
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataTHE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
 
Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)
 
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
 
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycData Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
 
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycData Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
 
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
 
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
 
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
 
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
 

Recently uploaded

Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 

Recently uploaded (20)

Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

Data mining with caret package