Linear Regression Parameters Rodolfo Campos (@camposer) Universidad Politécnica de Madrid Madrid, October 2012
When to consider Linear Regression? When the outcome, or class, is numeric, and all the attributes are numeric. The idea is to express the class as a linear combination of the attributes, with predetermined weights: x = w0 + w1a1 + w2a2 + … + wkak x is the class; a1, a2, …, ak are the attribute values; and w0, w1, …, wk are weights.
Linear Regression in Weka Options specific to weka.classifiers.functions.LinearRegression: D. Produce debugging output (default disabled). S <number of selection method>. Set the attribute selection method to use. 1 = None, 2 = Greedy (default 0 = M5 method). C. Do not try to eliminate colinear attributes. R <double>. Set ridge parameter (default 1.0e 8).
Linear Regression in Weka S <number of selection method>. Set the method used to select attributes for use in the linear regression: 0 = M5 method. Build trees whose leaves are associated to multivariate linear models and the nodes of the tree are chosen over the attribute that maximizes the expected error reduction, given by the Akaike information criterion (a measure of the relative goodness of fit of a statistical model).
Linear Regression in Weka 1 = None. No need explanation. 2 = Greedy. ”For example, a greedy strategy for the traveling salesman problem (which is of a high computational complexity) is the following heuristic: "At each stage visit an unvisited city nearest to the current city". This heuristic need not find a best solution but terminates in a reasonable number of steps; finding an optimal solution typically requires unreasonably many steps” from Wikipedia.
Linear Regression in Weka C. Do not try to eliminate colinear attributes. Possible examples: high performance, expensive German cars low performance, cheap American cars
Linear Regression in Weka R <double>. Set ridge parameter (default 1.0e8). Its value is assigned by the analyst, and determines how much Ridge Regression departs from Least Square Regression, whose goal is to circumvent the problem of predictors collinearity. If this value is too small, Ridge Regression cannot fight collinearity efficiently. If it is too large, the bias of the parameters become too large, and so do the parameters and predictions Mean Square Errors. It has therefore to be estimated by a series of trial and errors, usually resorting to crossvalidation
References I. Witten, E. Frank and M. Hall. Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Elsevier. MA, USA, 2011. Weka API. Class LinearRegression. Extracted on October 16, 2012 from http://weka.sourceforge.net/doc/weka/classifiers/functions/LinearRegre D. Rodríguez, J.J. Cuadrado, M.A. Sicilia and R. Ruiz. Segmentation of Software Engineering Datasets Using the M5 Algorithm. Extracted on October 14, 2012 from http://www.cc.uah.es/drg/c/ICCS06.pdf AI Access. Ridge Regression. Extracted on October 16, 2012 from http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_ridge.htm