Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Caret Package for R

15,956 views

Published on

Max Kuhn presentation of his own Caret package for R.

  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get Now to Download PDF Format === http://freeadygiuagdia.ygto.com/2803635348-i-r-tome-16-options-sur-la-guerre.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❤❤❤ http://bit.ly/2Q98JRS ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❶❶❶ http://bit.ly/2Q98JRS ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Caret Package for R

  1. 1. The caret Package: A Unified Interface for Predictive Models Max Kuhn Pfizer Global R&D Nonclinical Statistics Groton, CT max.kuhn@pfizer.com March 2, 2011
  2. 2. MotivationTheorem (No Free Lunch)In the absence of any knowledge about the prediction problem, no modelcan be said to be uniformly better than any otherGiven this, it makes sense to use a variety of different models to find onethat best fits the dataR has many packages for predictive modeling (aka machine learning)(akapattern recognition) . . . Max Kuhn (Pfizer Global R&D) caret March 2, 2011 2 / 27
  3. 3. Model Function ConsistencySince there are many modeling packages written by different people, thereare some inconsistencies in how models are specified and predictions aremade.For example, many models have only one method of specifying the model(e.g. formula method only)The table below shows the syntax to get probability estimates from severalclassification models: obj Class Package predict Function Syntax lda MASS predict(obj) (no options needed) glm stats predict(obj, type = "response") gbm gbm predict(obj, type = "response", n.trees) mda mda predict(obj, type = "posterior") rpart rpart predict(obj, type = "prob") Weka RWeka predict(obj, type = "probability") LogitBoost caTools predict(obj, type = "raw", nIter) Max Kuhn (Pfizer Global R&D) caret March 2, 2011 3 / 27
  4. 4. The caret PackageThe caret package was developed to: create a unified interface for modeling and prediction streamline model tuning using resampling provide a variety of “helper” functions and classes for day–to–day model building tasks increase computational efficiency using parallel processingFirst commits within Pfizer: 6/2005First version on CRAN: 10/2007Website: http://caret.r-forge.r-project.orgJSS Paper: www.jstatsoft.org/v28/i05/paper4 package vignettes (82 pages total) Max Kuhn (Pfizer Global R&D) caret March 2, 2011 4 / 27
  5. 5. Example Data: TunedIT Music Challengehttp://tunedit.org/challenge/music-retrieval/genresUsing 191 descriptors, classify 12495 musical segments into one of 6genres: Blues, Classical, Jazz, Metal, Pop, Rock.Use these data to predict a large test set of music segments.The predictors and class variables are contained in a data frame calledmusic. Max Kuhn (Pfizer Global R&D) caret March 2, 2011 5 / 27
  6. 6. Data SplittingcreateDataPartition conducts stratified random splits> ## Create a test set with 25% of the data> set.seed(1)> inTrain <- createDataPartition(music$GENRE, p = .75, list = FALSE)> str(inTrain) int [1:9373, 1] 2 7 14 20 22 47 48 51 64 80 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "Resample1"> trainDescr <- music[ inTrain, -ncol(music)]> testDescr <- music[-inTrain, -ncol(music)]> trainClass <- music$GENRE[ inTrain]> testClass <- music$GENRE[-inTrain]> prop.table(table(music$GENRE)) Blues Classical Jazz Metal Pop Rock0.12773109 0.27563025 0.24033613 0.07394958 0.12605042 0.15630252> prop.table(table(trainClass))trainClass Blues Classical Jazz Metal Pop Rock0.12770724 0.27557879 0.24037128 0.07393577 0.12610690 0.15630001Other functions: createFolds, createMultiFolds, createResamples Max Kuhn (Pfizer Global R&D) caret March 2, 2011 6 / 27
  7. 7. Data Pre–Processing MethodspreProcess calculates values that can be used to apply to any data set(e.g. training, set, unknowns).Current methods: centering, scaling, spatial sign transformation, PCA orICA “signal extraction” imputation (via bagging or k –nearest neighbors), ,Box–Cox transformations> ## Determine means and sds> procValues <- preProcess(trainDescr, method = c("center", "scale"))> procValues> ## Use the predict methods to do the adjustments> trainScaled <- predict(procValues, trainDescr)> testScaled <- predict(procValues, testDescr)preProcess can also be called within other functions for each resamplingiteration. Max Kuhn (Pfizer Global R&D) caret March 2, 2011 7 / 27
  8. 8. Model Tuningtrain uses resampling to tune and/or evaluate candidate models.> set.seed(1)> rbfSVM <- train(x = trainDescr, y = trainClass,+ method = "svmRadial",+ ## center and scale+ preProc = c("center", "scale"),+ ## Length of default tuning parameter grid+ tuneLength = 8,+ ## Bootstrap resampling with custon performance metrics:+ ## sensitivity, specificity and ROC curve AUC+ trControl = trainControl(method = "repeatedcv",+ repeats = 5),+ metric = "Kappa",+ ## Pass arguments to ksvm+ fit = FALSE)Fitting: sigma=0.005184962, C=0.25Fitting: sigma=0.005184962, C=0.5Fitting: sigma=0.005184962, C=1Fitting: sigma=0.005184962, C=2Fitting: sigma=0.005184962, C=4Fitting: sigma=0.005184962, C=8Fitting: sigma=0.005184962, C=16Fitting: sigma=0.005184962, C=32Aggregating results Max Kuhn (Pfizer Global R&D) caret March 2, 2011 8 / 27
  9. 9. Model Tuning> print(rbfSVM, printCall = FALSE)9373 samples 191 predictorsPre-processing: centered, scaledResampling: Cross-Validation (10 fold, repeated 5 times)Summary of sample sizes: 8437, 8435, 8434, 8435, 8437, 8436, ...Resampling results across tuning parameters: C Accuracy Kappa Accuracy SD Kappa SD 0.25 0.916 0.895 0.00953 0.0119 0.5 0.938 0.923 0.00824 0.0103 1 0.956 0.945 0.00641 0.008 2 0.964 0.955 0.00614 0.00766 4 0.968 0.961 0.0061 0.00761 8 0.969 0.962 0.00623 0.00777 16 0.969 0.962 0.00633 0.0079 32 0.969 0.962 0.0063 0.00786Tuning parameter sigma was held constant at a value of 0.00518Kappa was used to select the optimal model using the largest value.The final values used for the model were C = 16 and sigma = 0.00518. Max Kuhn (Pfizer Global R&D) caret March 2, 2011 9 / 27
  10. 10. Model Tuning> class(rbfSVM)[1] "train"> class(rbfSVM$finalModel)[1] "ksvm"attr(,"package")[1] "kernlab" Max Kuhn (Pfizer Global R&D) caret March 2, 2011 10 / 27
  11. 11. Model Tuning train uses as many “tricks” as possible to reduce the number of models fits (e.g. using sub–models). Here, it uses the kernlab function sigest to analytically estimate the RBF scale parameter. Currently, there are options for 108 models (see ?train for a list) Allows user–defined search grid, performance metrics and selection rules Easily integrates with any parallel processing framework that can emulate lapply Formula and non–formula interfaces Methods: predict, print, plot, varImp, resamples, xyplot, densityplot, histogram, stripplot, . . . Max Kuhn (Pfizer Global R&D) caret March 2, 2011 11 / 27
  12. 12. Plotsplot(rbfSVM, xTrans = function(x) log2(x)) Accuracy (Repeated Cross−Validation) 0.97 q q q q q 0.96 q 0.95 0.94 q 0.93 0.92 q −2 0 2 4 Cost Max Kuhn (Pfizer Global R&D) caret March 2, 2011 12 / 27
  13. 13. Plotsdensityplot(rbfSVM, metric = "Kappa", pch = "|") 40 30 Density 20 10 | | | | | | || | | | | | | | 0 | | | | | | | || | | | 0.94 0.96 0.98 Kappa Max Kuhn (Pfizer Global R&D) caret March 2, 2011 13 / 27
  14. 14. Prediction and Performance AssessmentThe predict method can be used to get results for other data sets:> svmPred <- predict(rbfSVM, testDescr)> str(svmPred) Factor w/ 6 levels "Blues","Classical",..: 3 2 6 3 5 6 5 1 2 6 ...> svmProbs <- predict(rbfSVM, testDescr, type = "prob")> str(svmProbs)data.frame: 3122 obs. of 6 variables: $ Blues : num 0.031097 -0.000271 0.056312 0.093638 0.077027 ... $ Classical: num 0.5118 0.9907 0.0342 0.1547 0.0908 ... $ Jazz : num 0.3153 0.0105 0.0743 0.3223 0.1635 ... $ Metal : num 0.056453 -0.000301 0.141614 0.143288 0.171406 ... $ Pop : num 0.02457 -0.000187 0.201617 0.143388 0.237104 ... $ Rock : num 0.060765 -0.000447 0.491973 0.142607 0.260142 ... Max Kuhn (Pfizer Global R&D) caret March 2, 2011 14 / 27
  15. 15. Predction and Performance Assessment> confusionMatrix(svmPred, testClass)Confusion Matrix and Statistics ReferencePrediction Blues Classical Jazz Metal Pop Rock Blues 395 0 0 3 1 1 Classical 0 841 21 0 1 2 Jazz 4 20 724 9 4 8 Metal 0 0 0 214 2 0 Pop 0 0 0 3 378 6 Rock 0 0 5 2 7 471Overall Statistics Accuracy : 0.9683 95% CI : (0.9615, 0.9742) No Information Rate : 0.2758 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.9605 Mcnemars Test P-Value : NA Max Kuhn (Pfizer Global R&D) caret March 2, 2011 15 / 27
  16. 16. Predction and Performance AssessmentStatistics by Class: Class: Blues Class: Classical Class: Jazz Class: MetalSensitivity 0.9900 0.9768 0.9653 0.92641Specificity 0.9982 0.9894 0.9810 0.99931Pos Pred Value 0.9875 0.9723 0.9415 0.99074Neg Pred Value 0.9985 0.9911 0.9890 0.99415Prevalence 0.1278 0.2758 0.2402 0.07399Detection Rate 0.1265 0.2694 0.2319 0.06855Detection Prevalence 0.1281 0.2771 0.2463 0.06919 Class: Pop Class: RockSensitivity 0.9618 0.9652Specificity 0.9967 0.9947Pos Pred Value 0.9767 0.9711Neg Pred Value 0.9945 0.9936Prevalence 0.1259 0.1563Detection Rate 0.1211 0.1509Detection Prevalence 0.1240 0.1553 Max Kuhn (Pfizer Global R&D) caret March 2, 2011 16 / 27
  17. 17. Comparing ModelsWe can use the resampling results to make formal and informalcomparisons between models.Based on the work of Hothorn et al. “The design and analysis of benchmark experiments” . Journal of Computational and Graphical Statistics (2005) vol. 14 (3) pp. 675-699 Eugster et al. “Exploratory and inferential analysis of benchmark experiments” Ludwigs-Maximilians-Universitat Munchen, Department . of Statistics, Tech. Rep (2008) vol. 30 Max Kuhn (Pfizer Global R&D) caret March 2, 2011 17 / 27
  18. 18. Comparing Models> set.seed(1)> rfFit <- train(x = trainDescr, y = trainClass,+ method = "rf", tuneLength = 5,+ trControl = trainControl(method = "repeatedcv",+ repeats = 5,+ verboseIter = FALSE),+ metric = "Kappa")> set.seed(1)> plsFit <- train(x = trainDescr, y = trainClass,+ method = "pls", tuneLength = 20,+ preProc = c("center", "scale", "BoxCox"),+ trControl = trainControl(method = "repeatedcv",+ repeats = 5,+ verboseIter = FALSE),+ metric = "Kappa") Max Kuhn (Pfizer Global R&D) caret March 2, 2011 18 / 27
  19. 19. Comparing Models> resamps <- resamples(list(rf = rfFit, pls = plsFit, svm = rbfSVM))> print(summary(resamps))Call:summary.resamples(object = resamps)Models: rf, pls, svmNumber of resamples: 50Accuracy Min. 1st Qu. Median Mean 3rd Qu. Max.rf 0.9200 0.9328 0.9370 0.9370 0.9424 0.9499pls 0.8348 0.8488 0.8554 0.8554 0.8631 0.8806svm 0.9478 0.9648 0.9691 0.9694 0.9752 0.9819Kappa Min. 1st Qu. Median Mean 3rd Qu. Max.rf 0.9003 0.9162 0.9215 0.9215 0.9282 0.9376pls 0.7932 0.8106 0.8192 0.8190 0.8286 0.8507svm 0.9350 0.9561 0.9615 0.9619 0.9691 0.9774 Max Kuhn (Pfizer Global R&D) caret March 2, 2011 19 / 27
  20. 20. Comparing Models> diffs <- diff(resamps, metric = "Kappa")> print(summary(diffs))Call:summary.diff.resamples(object = diffs)p-value adjustment: bonferroniUpper diagonal: estimates of the differenceLower diagonal: p-value for H0: difference = 0Kappa rf pls svmrf 0.10245 -0.04043pls < 2.2e-16 -0.14288svm < 2.2e-16 < 2.2e-16 Max Kuhn (Pfizer Global R&D) caret March 2, 2011 20 / 27
  21. 21. Parallel Coordinate Plotsparallel(resamps, metric = "Kappa") svm rf pls 0.8 0.85 0.9 0.95 Kappa Max Kuhn (Pfizer Global R&D) caret March 2, 2011 21 / 27
  22. 22. Box Plotsbwplot(resamps, metric = "Kappa") Kappa svm q q rf q pls q 0.80 0.85 0.90 0.95 Max Kuhn (Pfizer Global R&D) caret March 2, 2011 22 / 27
  23. 23. Dot Plots of Average Differencesdotplot(diffs) rf − svm q rf − pls q pls − svm q −0.15 −0.10 −0.05 0.00 0.05 0.10 Difference in Kappa Confidence Level 0.983 (multiplicity adjusted) Max Kuhn (Pfizer Global R&D) caret March 2, 2011 23 / 27
  24. 24. Other Functions and Classes nearZeroVar: a function to remove predictors that are sparse and highly unbalanced findCorrelation: a function to remove the optimal set of predictors to achieve low pair–wise correlations predictors: class for determining which predictors are included in the prediction equations (e.g. rpart, earth, lars models) (currently 57 methods) confusionMatrix, sensitivity, specificity, posPredValue, negPredValue: classes for assessing classifier performance varImp: classes for assessing the aggregate effect of a predictor on the model equations (currently 19 methods) Max Kuhn (Pfizer Global R&D) caret March 2, 2011 24 / 27
  25. 25. Other Functions and Classes knnreg: nearest–neighbor regression plsda, splsda: PLS discriminant analysis icr: independent component regression pcaNNet: nnet:::nnet with automatic PCA pre–processing step bagEarth, bagFDA: bagging with MARS and FDA models normalize2Reference: RMA–like processing of Affy arrays using a training set spatialSign: class for transforming numeric data (x = x /||x ||) maxDissim: a function for maximum dissimilarity sampling rfe: a class/framework for recursive feature selection (RFE) with external resampling step sbf: a class/framework for applying univariate filters prior to predictive modeling with external resampling featurePlot: a wrapper for several lattice functions Max Kuhn (Pfizer Global R&D) caret March 2, 2011 25 / 27
  26. 26. ThanksMA RUG Organizers, Kirk MettlerR CorePfizer’s Statistics leadership for providing the time and support to create Rpackagescaret contributors: Jed Wing, Steve Weston, Andre Williams, ChrisKeefer and Allan EngelhardtMax Kuhn (Pfizer Global R&D) caret March 2, 2011 26 / 27
  27. 27. Session Info R version 2.11.1 Patched (2010-09-30 r53356), x86_64-apple-darwin9.8.0 Base packages: base, datasets, graphics, grDevices, methods, splines, stats, tools, utils Other packages: caret 4.79, class 7.3-2, codetools 0.2-2, digest 0.4.2, e1071 1.5-24, gbm 1.6-3.1, kernlab 0.9-11, lattice 0.19-11, MASS 7.3-7, pls 2.1-0, plyr 1.2.1, randomForest 4.5-36, reshape 0.8.3, survival 2.35-8, weaver 1.14.0 Loaded via a namespace (and not attached): grid 2.11.1This presentation was created with a MacPro using LTEXand R’s Sweave Afunction at 09:33 on Saturday, Feb 26, 2011. Max Kuhn (Pfizer Global R&D) caret March 2, 2011 27 / 27

×