Your SlideShare is downloading.
×

×
# Saving this for later?

### Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

#### Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Text Mining with R -- an Analysis o... by Yanchang Zhao 4890 views
- Data mining-with-r-learning-with-ca... by Matin Alonso 7589 views
- An Introduction to Data Mining with R by Yanchang Zhao 1341 views
- Six Sigma Quality Using R: Tools an... by Emilio López Cano 1375 views
- Time series-mining-slides by Yanchang Zhao 2703 views
- Regression and Classification with R by Yanchang Zhao 289 views
- Data Clustering with R by Yanchang Zhao 373 views
- Introduction to Data Mining with R ... by Yanchang Zhao 393 views
- Data Exploration and Visualization ... by Yanchang Zhao 421 views
- The Powerful Marriage of Hadoop and... by Revolution Analytics 3375 views
- R programming Basic & Advanced by SOHOM GHOSH 878 views
- R Workshop for Beginners by Metamarkets 3157 views

Like this? Share it with your network
Share

4,373

views

views

Published on

a list of R packages and functions for data mining

a list of R packages and functions for data mining

Published in:
Technology

No Downloads

Total Views

4,373

On Slideshare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

204

Comments

0

Likes

6

No embeds

No notes for slide

- 1. R Reference Card for Data Mining Performance Evaluation apclusterK() afﬁnity propagation clustering to get K clusters (apcluster) performance() provide various measures for evaluating performance of pre- cclust() Convex Clustering, incl. k-means and two other clustering algo-by Yanchang Zhao, yanchang@rdatamining.com, January 3, 2013 diction and classiﬁcation models (ROCR) rithms (cclust)The latest version is available at http://www.RDataMining.com. Click the link roc() build a ROC curve (pROC) KMeansSparseCluster() sparse k-means clustering (sparcl)also for document R and Data Mining: Examples and Case Studies. auc() compute the area under the ROC curve (pROC) tclust(x,k,alpha,...) trimmed k-means with which a proportionThe package names are in parentheses. ROC() draw a ROC curve (DiagnosisMed) alpha of observations may be trimmed (tclust) PRcurve() precision-recall curves (DMwR) Association Rules & Frequent Itemsets CRchart() cumulative recall charts (DMwR) Hierarchical Clustering Packages a hierarchical decomposition of data in either bottom-up (agglomerative) or top-APRIORI Algorithm rpart recursive partitioning and regression trees down (divisive) waya level-wise, breadth-ﬁrst algorithm which counts transactions to ﬁnd frequent party recursive partitioning hclust(d, method, ...) hierarchical cluster analysis on a set of dissim-itemsets randomForest classiﬁcation and regression based on a forest of trees using ran- ilarities d using the method for agglomerationapriori() mine associations with APRIORI algorithm (arules) dom inputs birch() the BIRCH algorithm that clusters very large data with a CF-tree rpartOrdinal ordinal classiﬁcation trees, deriving a classiﬁcation tree when the (birch)ECLAT Algorithm response to be predicted is ordinal pvclust() hierarchical clustering with p-values via multi-scale bootstrap re- rpart.plot plots rpart models with an enhanced version of plot.rpart in the sampling (pvclust)employs equivalence classes, depth-ﬁrst search and set intersection instead of rpart package agnes() agglomerative hierarchical clustering (cluster)counting ROCR visualize the performance of scoring classiﬁers diana() divisive hierarchical clustering (cluster)eclat() mine frequent itemsets with the Eclat algorithm (arules) pROC display and analyze ROC curves mona() divisive hierarchical clustering of a dataset with binary variables only (cluster)Packages Regression rockCluster() cluster a data matrix using the Rock algorithm (cba)arules mine frequent itemsets, maximal frequent itemsets, closed frequent item- proximus() cluster the rows of a logical matrix using the Proximus algorithm sets and association rules. It includes two algorithms, Apriori and Eclat. Functions (cba)arulesViz visualizing association rules lm() linear regression isopam() Isopam clustering algorithm (isopam) glm() generalized linear regression LLAhclust() hierarchical clustering based on likelihood linkage analysis Sequential Patterns nls() non-linear regression (LLAhclust) predict() predict with models flashClust() optimal hierarchical clustering (ﬂashClust)Functions residuals() residuals, the difference between observed values and ﬁtted val- fastcluster() fast hierarchical clustering (fastcluster)cspade() mining frequent sequential patterns with the cSPADE algorithm ues (arulesSequences) cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchi- gls() ﬁt a linear model using generalized least squares (nlme) cal clustering dendrograms (dynamicTreeCut)seqefsub() searching for frequent subsequences (TraMineR) gnls() ﬁt a nonlinear model using generalized least squares (nlme) HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)Packages PackagesarulesSequences add-on for arules to handle and mine frequent sequences nlme linear and nonlinear mixed effects models Model based ClusteringTraMineR mining, describing and visualizing sequences of states or events Clustering Mclust() model-based clustering (mclust) Classiﬁcation & Prediction HDDC() a model-based method for high dimensional data clustering (HDclas- Partitioning based Clustering sif )Decision Trees partition the data into k groups ﬁrst and then try to improve the quality of clus- fixmahal() Mahalanobis Fixed Point Clustering (fpc)ctree() conditional inference trees, recursive partitioning for continuous, cen- tering by moving objects from one group to another fixreg() Regression Fixed Point Clustering (fpc) sored, ordered, nominal and multivariate response variables in a condi- kmeans() perform k-means clustering on a data matrix mergenormals() clustering by merging Gaussian mixture components (fpc) tional inference framework (party) kmeansCBI() interface function for kmeans (fpc)rpart() recursive partitioning and regression trees (rpart) Density based Clustering kmeansruns() call kmeans for the k-means clustering method and includes generate clusters by connecting dense regionsmob() model-based recursive partitioning, yielding a tree with ﬁtted models estimation of the number of clusters and ﬁnding an optimal solution from associated with each terminal node (party) dbscan(data,eps,MinPts,...) generate a density based clustering of several starting points (fpc) arbitrary shapes, with neighborhood radius set as eps and density thresh-Random Forest pam() the Partitioning Around Medoids (PAM) clustering method (cluster) old as MinPts (fpc)cforest() random forest and bagging ensemble (party) pamk() the Partitioning Around Medoids (PAM) clustering method with esti- pdfCluster() clustering via kernel density estimation (pdfCluster)randomForest() random forest (randomForest) mation of number of clusters (fpc)varimp() variable importance (party) cluster.optimal() search for the optimal k-clustering of the dataset Other Clustering Techniquesimportance() variable importance (randomForest) (bayesclust) clara() Clustering Large Applications (cluster) mixer() random graph clustering (mixer)Neural Networks nncluster() fast clustering with restarted minimum spanning tree (nnclust) fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (clus-nnet() ﬁt single-hidden-layer neural network (nnet) orclus() ORCLUS subspace clustering (orclus) ter)Support Vector Machine (SVM) kcca() k-centroids clustering (ﬂexclust) Plotting Clustering Solutionssvm() train a support vector machine for regression, classiﬁcation or density- ccfkms() clustering with Conjugate Convex Functions (cba) plotcluster() visualisation of a clustering or grouping in data (fpc) estimation (e1071) apcluster() afﬁnity propagation clustering for a given similarity matrix (ap- bannerplot() a horizontal barplot visualizing a hierarchical clustering (clus-ksvm() support vector machines (kernlab) cluster) ter)
- 2. Cluster Validation Packages SnowballStemmer() Snowball word stemmers (Snowball)silhouette() compute or extract silhouette information (cluster) extremevalues detect extreme values in one-dimensional data LDA() ﬁt a LDA (latent Dirichlet allocation) model (topicmodels)cluster.stats() compute several cluster validity statistics from a cluster- mvoutlier multivariate outlier detection based on robust methods CTM() ﬁt a CTM (correlated topics model) model (topicmodels) ing and a dissimilarity matrix (fpc) outliers some tests commonly used for identifying outliers terms() extract the most likely terms for each topic (topicmodels)clValid() calculate validation measures for a given set of clustering algo- Rlof a parallel implementation of the LOF algorithm topics() extract the most likely topics for each document (topicmodels) rithms and number of clusters (clValid) Time Series Analysis PackagesclustIndex() calculate the values of several clustering indexes, which can tm a framework for text mining applications be independently used to determine the number of clusters existing in a Construction & Plot lda ﬁt topic models with LDA data set (cclust) ts() create time-series objects (stats) topicmodels ﬁt topic models with LDA and CTMNbClust() provide 30 indices for cluster validation and determining the num- plot.ts() plot time-series objects (stats) RTextTools automatic text classiﬁcation via supervised learning ber of clusters (NbClust) smoothts() time series smoothing (ast) tm.plugin.dc a plug-in for package tm to support distributed text miningPackages sfilter() remove seasonal ﬂuctuation using moving average (ast) tm.plugin.mail a plug-in for package tm to handle mailcluster cluster analysis Decomposition RcmdrPlugin.TextMining GUI for demonstration of text mining concepts andfpc various methods for clustering and cluster validation tm package decomp() time series decomposition by square-root ﬁlter (timsac)mclust model-based clustering and normal mixture modeling textir a suite of tools for inference about text documents and associated sentiment decompose() classical seasonal decomposition by moving averages (stats)birch clustering very large datasets using the BIRCH algorithm tau utilities for text analysis stl() seasonal decomposition of time series by loess (stats)pvclust hierarchical clustering with p-values textcat n-gram based text categorization tsr() time series decomposition (ast)apcluster Afﬁnity Propagation Clustering YjdnJlp Japanese text analysis by Yahoo! Japan Developer Network ardec() time series autoregressive decomposition (ArDec)cclust Convex Clustering methods, including k-means algorithm, On-line Up- Social Network Analysis and Graph Mining date algorithm and Neural Gas algorithm and calculation of indexes for Forecasting ﬁnding the number of clusters in a data set arima() ﬁt an ARIMA model to a univariate time series (stats) Functionscba Clustering for Business Analytics, including clustering techniques such as predict.Arima() forecast from models ﬁtted by arima (stats) graph(), graph.edgelist(), graph.adjacency(), Proximus and Rock auto.arima() ﬁt best ARIMA model to univariate time series (forecast) graph.incidence() create graph objects respectively from edges,bclust Bayesian clustering using spike-and-slab hierarchical model, suitable for forecast.stl(), forecast.ets(), forecast.Arima() an edge list, an adjacency matrix and an incidence matrix (igraph) clustering high-dimensional data forecast time series using stl, ets and arima models (forecast) plot(), tkplot() static and interactive plotting of graphs (igraph)biclust algorithms to ﬁnd bi-clusters in two-dimensional data Packages gplot(), gplot3d() plot graphs (sna)clue cluster ensembles forecast displaying and analysing univariate time series forecasts V(), E() vertex/edge sequence of igraph (igraph)clues clustering method based on local shrinking timsac time series analysis and control program are.connected() check whether two nodes are connected (igraph)clValid validation of clustering results ast time series analysis degree(), betweenness(), closeness() various centrality scoresclv cluster validation techniques, contains popular internal and external cluster ArDec time series autoregressive-based decomposition (igraph, sna) validation methods for outputs produced by package cluster ares a toolbox for time series analyses using generalized additive models add.edges(), add.vertices(), delete.edges(),bayesclust tests/searches for signiﬁcant clusters in genetic data dse tools for multivariate, linear, time-invariant, time series models delete.vertices() add and delete edges and vertices (igraph)clustvarsel variable selection for model-based clustering neighborhood() neighborhood of graph vertices (igraph, sna)clustsig signiﬁcant cluster analysis, tests to see which (if any) clusters are statis- Text Mining get.adjlist() adjacency lists for edges or vertices (igraph) tically different Functions nei(), adj(), from(), to() vertex/edge sequence indexing (igraph)clusterﬂy explore clustering interactively cliques() ﬁnd cliques, ie. complete subgraphs (igraph) Corpus() build a corpus, which is a collection of text documents (tm)clusterSim search for optimal clustering procedure for a data set clusters() maximal connected components of a graph (igraph) tm map() transform text documents, e.g., stemming, stopword removal (tm)clusterGeneration random cluster generation %->%, %<-%, %--% edge sequence indexing (igraph) tm filter() ﬁltering out documents (tm)clusterCons calculate the consensus clustering result from re-sampled clustering get.edgelist() return an edge list in a two-column matrix (igraph) TermDocumentMatrix(), DocumentTermMatrix() construct a experiments with the option of using multiple algorithms and parameter read.graph(), write.graph() read and writ graphs from and to ﬁles term-document matrix or a document-term matrix (tm)gcExplorer graphical cluster explorer (igraph) Dictionary() construct a dictionary from a character vector or a term-hybridHclust hybrid hierarchical clustering via mutual clusters Packages document matrix (tm)Modalclust hierarchical modal Clustering findAssocs() ﬁnd associations in a term-document matrix (tm) sna social network analysisiCluster integrative clustering of multiple genomic data types findFreqTerms() ﬁnd frequent terms in a term-document matrix (tm) igraph network analysis and visualizationEMCC evolutionary Monte Carlo (EMC) methods for clustering stemDocument() stem words in a text document (tm) statnet a set of tools for the representation, visualization, analysis and simulationrEMM extensible Markov Model (EMM) for data stream clustering stemCompletion() complete stemmed words (tm) of network data Outlier Detection termFreq() generate a term frequency vector from a text document (tm) egonet ego-centric measures in social network analysis stopwords(language) return stopwords in different languages (tm) snort social network-analysis on relational tablesFunctions removeNumbers(), removePunctuation(), removeWords() re- network tools to create and modify network objectsboxplot.stats()$out list data points lying beyond the extremes of the move numbers, punctuation marks, or a set of words from a text docu- bipartite visualising bipartite networks and calculating some (ecological) indices whiskers ment (tm) blockmodelinggeneralized and classical blockmodeling of valued networkslofactor() calculate local outlier factors using the LOF algorithm (DMwR removeSparseTerms() remove sparse terms from a term-document matrix diagram visualising simple graphs (networks), plotting ﬂow diagrams or dprep) (tm) NetCluster clustering for networkslof() a parallel implementation of the LOF algorithm (Rlof ) textcat() n-gram based text categorization (textcat) NetData network data for McFarland’s SNA R labs
- 3. NetIndices estimating network indices, including trophic structure of foodwebs Packages googleVis an interface between R and the Google Visualisation API to create in R nlme linear and nonlinear mixed effects models interactive chartsNetworkAnalysis statistical inference on populations of weighted or unweighted lattice a powerful high-level data visualization system, with an emphasis on mul- networks Graphics tivariate datatnet analysis of weighted, two-mode, and longitudinal networks Functions vcd visualizing categorical datatriads triad census for networks plot() generic function for plotting (graphics) denpro visualization of multivariate, functions, sets, and data Spatial Data Analysis barplot(), pie(), hist() bar chart, pie chart and histogram (graph- iplots interactive graphicsFunctions ics) Data Manipulation boxplot() box-and-whisker plot (graphics)geocode() geocodes a location using Google Maps (ggmap) stripchart() one dimensional scatter plot (graphics) Functionsqmap() quick map plot (ggmap) dotchart() Cleveland dot plot (graphics) transform() transform a data frameget map() queries the Google Maps, OpenStreetMap, or Stamen Maps server qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot (stats) scale() scaling and centering of matrix-like objects for a map at a certain location (ggmap) coplot() conditioning plot (graphics) t() matrix transposegvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), splom() conditional scatter plot matrices (lattice) aperm() array transpose gvisMap() Google geo charts and maps (googleVis) pairs() a matrix of scatterplots (graphics) sample() samplingGetMap() download a static map from the Google server (RgoogleMaps) cpairs() enhanced scatterplot matrix (gclus) table(), tabulate(), xtabs() cross tabulation (stats)ColorMap() plot levels of a variable in a colour-coded map (RgoogleMaps) parcoord() parallel coordinate plot (MASS) stack(), unstack() stacking vectorsPlotOnStaticMap() overlay plot on background image of map tile cparcoord() enhanced parallel coordinate plot (gclus) split(), unsplit() divide data into groups and reassemble (RgoogleMaps) paracoor() parallel coordinates plot (denpro) reshape() reshape a data frame between “wide” and “long” format (stats)TextOnStaticMap() plot text on map (RgoogleMaps) parallelplot() parallel coordinates plot (lattice) merge() merge two data frames; similar to database join operationsPackages densityplot() kernel density plot (lattice) aggregate() compute summary statistics of data subsets (stats)plotGoogleMaps plot spatial data as HTML map mushup over Google Maps contour(), filled.contour() contour plot (graphics) by() apply a function to a data frame split by factorsRgoogleMaps overlay on Google map tiles in R levelplot(), contourplot() level plots and contour plots (lattice) melt(), cast() melt and then cast data into the reshaped or aggregatedplotKML visualization of spatial and spatio-temporal objects in Google Earth smoothScatter() scatterplots with smoothed densities color representation; form you want (reshape)ggmap Spatial visualization with Google Maps and OpenStreetMap capable of visualizing large datasets (graphics) complete.cases() ﬁnd complete cases, i.e., cases without missing valuesclustTool GUI for clustering data with spatial information sunflowerplot() a sunﬂower scatter plot (graphics) na.fail, na.omit, na.exclude, na.pass handle missing valuesSGCS Spatial Graph based Clustering Summaries for spatial point patterns assocplot() association plot (graphics) Packagesspdep spatial dependence: weighting schemes, statistics and models mosaicplot() mosaic plot (graphics) reshape ﬂexibly restructure and aggregate data matplot() plot the columns of one matrix against the columns of another Statistics data.table extension of data.frame for fast indexing, ordered joins, assignment, (graphics) and grouping and list columnsSummarization fourfoldplot() a fourfold display of a 2 × 2 × k contingency table (graph- gdata various tools for data manipulation ics)summary() summarize datadescribe() concise statistical description of data (Hmisc) persp() perspective plots of surfaces over the x?y plane (graphics) Data Access cloud(), wireframe() 3d scatter plots and surfaces (lattice) Functionsboxplot.stats() box plot statistics interaction.plot() two-way interaction plot (stats)Analysis of Variance iplot(), ihist(), ibar(), ipcp() interactive scatter plot, his- save(), load() save and load R data objectsaov() ﬁt an analysis of variance model (stats) togram, bar plot, and parallel coordinates plot (iplots) read.csv(), write.csv() import from and export to .CSV ﬁlesanova() compute analysis of variance (or deviance) tables for one or more pdf(), postscript(), win.metafile(), jpeg(), bmp(), read.table(), write.table(), scan(), write() read and ﬁtted model objects (stats) png(), tiff() save graphs into ﬁles of various formats write data write.matrix() write a matrix or data frame (MASS)Statistical Test gvisAnnotatedTimeLine(), gvisAreaChart(), readLines(), writeLines() read/write text lines from/to a connection,t.test() student’s t-test (stats) gvisBarChart(), gvisBubbleChart(), gvisCandlestickChart(), gvisColumnChart(), such as a text ﬁleprop.test() test of equal or given proportions (stats) sqlQuery() submit an SQL query to an ODBC database (RODBC)binom.test() exact binomial test (stats) gvisComboChart(), gvisGauge(), gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), sqlFetch() read a table from an ODBC database (RODBC)Mixed Effects Models gvisLineChart(), gvisMap(), gvisMerge(), sqlSave(), sqlUpdate() write or update a table in an ODBC databaselme() ﬁt a linear mixed-effects model (nlme) gvisMotionChart(), gvisOrgChart(), (RODBC)nlme() ﬁt a nonlinear mixed-effects model (nlme) gvisPieChart(), gvisScatterChart(), sqlColumns() enquire about the column structure of tables (RODBC) sqlTables() list tables on an ODBC connection (RODBC)Principal Components and Factor Analysis gvisSteppedAreaChart(), gvisTable(), gvisTreeMap() various interactive charts produced with the Google odbcConnect(), odbcClose(), odbcCloseAll() open/close con-princomp() principal components analysis (stats) nections to ODBC databases (RODBC)prcomp() principal components analysis (stats) Visualisation API (googleVis) gvisMerge() merge two googleVis charts into one (googleVis) dbSendQuery execute an SQL statement on a given database connectionOther Functions (DBI)var(), cov(), cor() variance, covariance, and correlation (stats) Packages dbConnect(), dbDisconnect() create/close a connection to a DBMSdensity() compute kernel density estimates (stats) ggplot2 an implementation of the Grammar of Graphics (DBI)
- 4. Packages snowfall usability wrapper around snow for easier development of parallel R gWidgets a toolkit-independent API for building interactive GUIsRODBC ODBC database access programs Red-R An open source visual programming GUI interface for RDBI a database interface (DBI) between R and relational DBMS snowFT extension of snow supporting fault tolerant and reproducible applica- R AnalyticFlow a software which enables data analysis by drawing analysisRMySQL interface to the MySQL database tions, and easy-to-use parallel programming ﬂowchartsRJDBC access to databases through the JDBC interface Rmpi interface (Wrapper) to MPI (Message-Passing Interface) latticist a graphical user interface for exploratory visualisationRSQLite SQLite interface for R rpvm R interface to PVM (Parallel Virtual Machine) nws provide coordination and parallel execution facilities Other R Reference CardsROracle Oracle database interface (DBI) driver foreach foreach looping construct for R R Reference Card, by Tom ShortRpgSQL DBI/RJDBC interface to PostgreSQL database doMC foreach parallel adaptor for the multicore package http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/RODM interface to Oracle Data Mining doSNOW foreach parallel adaptor for the snow package R-refcard.pdf orxlsReadWrite read and write Excel ﬁles doMPI foreach parallel adaptor for the Rmpi package http://cran.r-project.org/doc/contrib/Short-refcard.pdfWriteXLS create Excel 2003 (XLS) ﬁles from data frames doParallel foreach parallel adaptor for the multicore package R Reference Card, by Jonathan BaronBig Data doRNG generic reproducible parallel backend for foreach Loops http://cran.r-project.org/doc/contrib/refcard.pdf GridR execute functions on remote hosts, clusters or grids R Functions for Regression Analysis, by Vito RicciFunctions http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.as.ffdf() coerce a dataframe to an ffdf (ff ) fork R functions for handling multiple processes pdfread.table.ffdf(), read.csv.ffdf() read data from a ﬂat ﬁle to Generating Reports R Functions for Time Series Analysis, by Vito Ricci an ffdf object (ff ) http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfwrite.table.ffdf(), write.csv.ffdf() write an ffdf object to a Sweave() mixing text and R/S code for automatic report generation (utils) ﬂat ﬁle (ff ) knitr a general-purpose package for dynamic report generation in R RDataMining Website, Package, Twitter & Groupsffdfappend() append a dataframe or an ffdf to an existing ffdf (ffdf ) R2HTML making HTML reports R2PPT generating Microsoft PowerPoint presentations RDataMining Website: http://www.rdatamining.combig.matrix() create a standard big.matrix, which is constrained to avail- Group on LinkedIn: http://group.rdatamining.com able RAM (bigmemory) Interface to Weka Group on Google: http://group2.rdatamining.comread.big.matrix() create a big.matrix by reading from an ASCII ﬁle Package RWeka is an R interface to Weka, and enables to use the following Weka Twitter: http://twitter.com/rdatamining (bigmemory) functions in R. RDataMining Package: http://www.rdatamining.com/packagewrite.big.matrix() write a big.matrix to a ﬁle (bigmemory) Association rules: http://package.rdatamining.comfilebacked.big.matrix() create a ﬁle-backed big.matrix, which may Apriori(), Tertius() exceed available RAM by using hard drive space (bigmemory) Regression and classiﬁcation:mwhich() expanded “which”-like functionality (bigmemory) LinearRegression(), Logistic(), SMO()Packages Lazy classiﬁers:ff memory-efﬁcient storage of large data on disk and fast access functions IBk(), LBR()ffbase basic statistical functions for package ff Meta classiﬁers:ﬁlehash a simple key-value database for handling large data AdaBoostM1(), Bagging(), LogitBoost(),g.data create and maintain delayed-data packages MultiBoostAB(), Stacking(),BufferedMatrix a matrix data storage object held in temporary ﬁles CostSensitiveClassifier()biglm regression for data too large to ﬁt in memory Rule classiﬁers:bigmemory manage massive matrices with shared memory and memory-mapped JRip(), M5Rules(), OneR(), PART() ﬁles Regression and classiﬁcation trees:biganalytics extend the bigmemory package with various analytics J48(), LMT(), M5P(), DecisionStump()bigtabulate table-, tapply-, and split-like functionality for matrix and Clustering: big.matrix objects Cobweb(), FarthestFirst(), SimpleKMeans(), XMeans(), DBScan()Parallel Computing Filters:Functions Normalize(), Discretize()foreach(...) %dopar% looping in parallel (foreach) Word stemmers:registerDoSEQ(), registerDoSNOW(), registerDoMC() regis- IteratedLovinsStemmer(), LovinsStemmer() ter respectively the sequential, SNOW and multicore parallel backend Tokenizers: with the foreach package (foreach, doSNOW, doMC) AlphabeticTokenizer(), NGramTokenizer(),sfInit(), sfStop() initialize and stop the cluster (snowfall) WordTokenizer()sfLapply(), sfSapply(), sfApply() parallel versions of Editors/GUIs lapply(), sapply(), apply() (snowfall) Tinn-R a free GUI for R language and environmentPackages RStudio a free integrated development environment (IDE) for Rmulticore parallel processing of R code on machines with multiple cores or rattle graphical user interface for data mining in R CPUs Rpad workbook-style, web-based interface to Rsnow simple parallel computing in R RPMG graphical user interface (GUI) for interactive R analysis sessions

Be the first to comment