SlideShare a Scribd company logo
1 of 7
Download to read offline
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
Optimizing Transformation for Linearity between Online 
Software Repository Variables. 
Ogunyinka, Peter I1*and Badmus, Nofiu Idowu2 
1. Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Nigeria 
2. Department of Statistics, Abraham Adesanya Polytechnic, Ijebu-Igbo, Nigeria 
* E-mail of the corresponding author: pixelgoldprod@yahoo.com 
Abstract 
Online Software Repositories (OSR) like sourceforge.net and google code contain a wealth of valuable data 
about software projects but these data violate the linearity and normal assumptions, hence making the data 
impossible for use in most statistical data analysis. To prepare these data for statistical data analysis, the data 
were non-linearly transformed, hence, these research established the best twelve (12) transformed model that 
obey linearity assumptions, higher coefficient of determination (푅2), positive and negative relationship and 
gained variable significance over the original data. Similarly, the back transformation or interpretation was 
provided about each of these twelve (12) best ranked linear models to solve the challenges of data transformation 
encountered by researchers. 
Keywords: Data transformation, Linear regression model, OkikiSoft, Online Software Repository and 
Sourceforge.net. 
1. Introduction 
Online Software repository (OSR) is a web based storage of Computer software. OSR contains a wealth of 
valuable information about software projects. Ahmed (2008) gave types of software repository as historical 
repositories, run-time repositories and Code repositories. This research focuses on the code repositories (CR). 
CR, such as www.sourceforge.net and google code (www.code.google.com), host the source codes of various 
applications developed by several developers. According to Ahmed (2008), very often, data available on OSR 
exhibit large amount of noise and skew. The use of such data may lead to incorrect results and conclusions. 
Ahmed (2008) recommends that software repository researchers should closely study the noise and skew in the 
data and better understand the effect on the analysis. Statistical visualization is essential to spot the noise and 
skewness. He concluded in his recommendation that OSR researchers should provide guidelines and tools to 
improve the quality of repository data. This research uses mining software repository (MSR) software called 
OKIKISOFT. Okikisoft is the authors developed artificial intelligence software for automatic mining of data on 
the webpages of sourceforge.net. The statistical analysis of the data mined on the repository revealed the 
violation of linearity assumption between the repository variables. This violation of statistical assumption can 
lead to type-I or type-II error, hence, calling for data transformation for the improvement of the quality of the 
repository data for subsequent analysis. 
Why Sourceforge.net? 
Wikipedia (2014) established sourceforge.net as the first free web based source code repository that hosts free 
and open source software. Among its competitors are Github (www.github.com), Google Code 
(www.code.google.com) and Javaforge (www.javaforge.com). Alexa (2014) rates sourceforge as 162nd world 
best website and the first best repository among the aforementioned competitors. These characteristics have 
called the attention of this research to take a study of the repository for the hundreds of researchers and millions 
of visitors visiting the website. 
2. Methodology 
Regression analysis requires the satisfaction of linearity, normality, homoscedasticity and independence. 
Osborne (2002) established that the violation of the conditions can increase the probability of committing type-I 
or type-II error. Over decades, data transformation has been recommended as the solution to non-linearity, 
outliers, among others. Data transformation involves using a mathematical operation to change the measurement 
scale of variable(s). Linear and nonlinear data transformations are the types of data transformation available. 
Linear transformation retains the relationship between variables while non-linear transformation changes 
(increases or decreases) the integrity of the relationship between variables. Tabanick and Fidell (2007) 
205
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
acknowledging the importance of transformation stated that it may not be generally acceptable by authors as a 
results of difficulty of interpreting the transformed variables but remains a legitimate statistical tool for the 
realization of linear assumption. This research intends to apply transformation technique to online software 
repository variables to establish relationship between these variables which was found absent in the original data. 
2.1 Linear Regression and Data Transformation 
The interpretation of data based on the analysis of variance (ANOVA) is valid if the assumptions of normality, 
homogeneity, independence and addition assumptions are satisfied. Similarly, regression analysis acknowledges 
linearity and the first three aforementioned assumptions for implementation. O’Hara and Hotze (2010) emphases 
that the main purpose of data transformation is to get a sample data to conform with the assumptions of 
parametric statistics such as ANOVA, t-test and linear regression or to manage outliers in a dataset. Marija (2004) 
established that data transformation technique is neither a cheating technique nor distortion of the true picture of 
the data under consideration, rather, it is a legitimate statistical tool. Literatures have established that results 
interpretation of transformed data analysis is the major challenge facing this statistical technique. However, one 
added benefit about most transformation technique is that when data are transformed to meet a certain 
assumption, we often come closer to satisfy another assumptions as well. For instance, square root 
transformation may help to equate group variances by compressing the upper of the distribution end more than it 
compresses the lower end. It may also have effect of making a positively skewed distribution more nearly 
normal in shape. Howel (2007) recommended, as a solution to interpretation problem of data transformation, that 
researchers should look at both the transformed and original data means and make sure that they are telling the 
same basic story. Table 1 presents the common ways to transform variables to achieve literatures for regression 
analysis. 
Table 1: The common statistical transformation techniques 
SN Method Transformed Variable Regression 
206 
Equation 
Predicted/Back 
transformation value (풀̂ 
) 
01 Standard linear 
regression 
None 
푌 = 푏0 + 푏1푋 
푌̂ 
= 푏0 + 푏1푋 
02 Exponential 
transformation 
Dependent variable 
(푙표푔10푌) 
푙표푔10푌 = 푏0 + 푏1푋 푌̂ 
= 10(푏0+푏1푋) 
03 Quadratic 
transformation 
Dependent variable 
(푆푞푟푡(푌)) 
푆푞푟푡(푌) = 푏0 + 푏1푋 푌̂ 
= (푏0 + 푏1푋)2 
04 Reciprocal 
transformation 
Dependent variable 
(푦−1) 
푦−1 = 푏0 + 푏1푋 푌̂ 
= 1⁄(푏0 + 푏1푋) 
05 Logarithm 
transformation 
Independent variable 
(푙표푔10푋) 
푌 = 푏0 + 푏1푙표푔10푋 푌̂ 
= 푏0 + 푏1푙표푔10푋 
06 Power 
transformation 
Dependent variable 푙표푔10푌 
and independent variable 
푙표푔10푋 
푙표푔10푌 
= 푏0 + 푏1푙표푔10푋 
푌̂ 
= 10(푏0+푏1푙표푔10푋) 
SN Method Transformed Variable Regression 
Equation 
Predicted/Back 
transformation value (풀̂ 
) 
07 Square 
transformation 
Independent variable (푋2) 푌 = 푏0 + 푏1푋2 푌̂ 
= 푏0 + 푏1푋2 
In practise, these methods need to be tested on the data to which they are applied for the confirmation that they 
increase rather than decrease the linearity strength of the relationship. Among the methods to detect the 
efficiency of the transformed data are to establish linearity, obtain the coefficient of determination (푅2) and to 
conduct a significant test of the independent variable on the response variable. It is expected that 푅2 of the 
transformed variables will be higher than the non-transformed variables and that the independent variables will 
be significant to the response variable. Back transformation is used to return a transformed predicted value to its 
original scale. Back transformation predicted values give values for the medium response but not the mean 
response as it is expected. Miller (1984) established that back transformation on the mean of the dependent 
variable results to serious bias. He, further, established a solution that minimizes the bias. Jia and Rathi (2008),
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
confirming the bias, established a more efficient solution that almost removed the bias. Researchers are advised 
to consult this aforementioned literature for implementation. 
2.2 Mining of data 
Visitors to sourceforge have the privilege to find, create, publish, rate and download free and open source 
software from the repository. Rating of software is done based on two criteria viz. Stars rating and laid down 
criteria rating. The user can rate 5,4,3,2 or 1 star(s) after which the average stars rating is computed automatically. 
Computation of the average rating is shown in table 2 below. 
Table 2: Average rating computation for filezilla on sourceforge.net as at April 26, 2014. 
207 
(a) 
Rating Star 
(b) 
Number of Raters 
(c=a*b) 
Total 
5 843 4215 
4 11 44 
3 6 18 
2 4 8 
1 113 113 
Total 977 4398 
Average ( 
ퟓ풊 
Σ 풄풊 
=ퟏ 
ퟓ풊 
Σ 풃풊 
=ퟏ 
) 4.5 
Similarly, laid-down criteria including design, ease, feature and support are considered in rating software by the 
user. The repository takes account of total number of people that rate software and download software on it. 
Perhaps, knowing to the visitors, the repository system collects some information about the users’ operating 
system maker and users’ country locations. 
Manually collecting data on sourceforge can be a tedious task spanning through days and weeks depending on 
the volume of the data under concern. This research uses mining software repository (MSR) software named 
Okikisoft which was specially developed for this research purpose. Okikisoft automatically and invisibly visits 
the pages of sourceforge to extract the required data. It compiles the mined data into CSV files and save it in the 
application folder. Okikisoft can act as a server-side or client-side mining system. 
3. Presentation and Analysis of the Mined Data 
The extracted data variables are represented as follows: D represents Download total for software, F represents 
Filesize of the software , R represents Average Rating for the software and T represents Total number of visitors 
that rate the software. Okikisoft mined 1802 software data ranging between February 1 through February 28, 
2014. However, a sample size of 푛 = 50 was randomly selected for this research. The scatter plots matrix of the 
original data is presented in figure 1.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
1 푅 푇 푅 = 훼 + 훽푇 + 0.03% 푃 = 0.976: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
2 푅 퐹 푅 = 훼 + 훽퐹 + 0.02% 푃 = 0.879: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
3 푅 퐷 푅 = 훼 + 훽퐷 - 0.02% 푃 = 0.880: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
4 푇 퐹 푇 = 훼 + 훽퐹 + 0.0% 푃 = 0.992: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
5 (*) 푇 퐷 푇 = 훼 + 훽퐷 - 23.6% 푃 = 0.000: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 
6 (*) 퐹 퐷 퐹 = 훼 + 훽퐷 - 21.4% 푃 = 0.001: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 
208 
Figure 1: Scatter plot matrix of the original data. 
Figure 1 reveals that none of the plots in the scatter plots matrix can be assumed to be linear. Similarly table 3 
shows the analysis results done on the original data. Very low coefficient of determination (푅2) and 
insignificance of the independent variable were experienced between (푅 and 푇), (푅 and 퐹), (푅 and 퐷) and (푇 
and 퐹) and negative relationship was experienced between (푇 and 퐷) and (퐹 and 퐷). This result coincides 
with Ahmed (2008) that the original online repository data may violate fundamental assumptions required by 
researchers, hence, we recommend that researchers should study the data before further analysis. 
Table 3: Analysis results of original data. 
Sn. 
푽풂풓ퟏ 
= 풚 
푽풂풓ퟐ 
= 풙 
Linear Regression 
Model 
+/ 
- r 
푹ퟐ 
푯ퟎ: 휷 = ퟎ 
휶 = ퟎ. ퟎퟓ 
3.1 Data transformation and analysis 
To prevent values between 0 and 1 in the original data (Osborne (2002)), a linear transformation was done by 
adding 2 to all data in the four variables. The linearly transformed variables were further nonlinearly transformed 
with the seven tools viz: 푙표푔2푦, 푙표푔10푦, √푦, 푦2, √푦 3 , 푦3 and 푦−1. Table 4 shows the results of the data 
transformation and figure 2 shows the scatter plot matrix for the data.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
-/+ 
r 
1 T 푙표푔10퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 
2 푙표푔10퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
3 푙표푔10푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
4 √퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
5 푙표푔2푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
6 √퐷 = 퐷∗ 푇2 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
7 T √퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 
8 √푇 = 푇∗ √퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
9 푙표푔10푇 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ - - 푇̂ 
10 푇3 = 푇∗ 퐷 푇∗ = 훼 + 훽퐷 ++ 푇̂ 
= √(훼 + 훽퐷) 3 
11 D 푇2 = 푇∗ 퐷 = 훼 + 훽푇∗ ++ 퐷̂ 
= 훼 + 훽푇∗ 29.0% 11th 
12 푇3 = 푇∗ √퐷 3 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
= √(훼 + 훽퐷∗) 3 
13 푙표푔10퐷 = 퐷∗ 푇−1 = 푇∗ 퐷∗ = 훼 + 훽푇∗ - - 퐷̂ 
14 푇−1 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
= (훼 + 훽퐷∗)−1 13.9% 
209 
Table 4: Analysis results of transformed data. 
Sn 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 
Linear Regre-ssion 
Model 
Back 
Transformation 
푹ퟐ 
= 훼 + 훽퐷∗ 55.7% 1st 
= 10(훼+훽푇∗) 52.2% 2nd 
= 10(훼+훽퐷∗) 46.4% 3rd 
= (훼 + 훽푇∗)2 41.6% 4th 
= 2(훼+훽퐷∗) 41.2% 5th 
= (훼 + 훽푇∗)2 40.6% 6th 
= 훼 + 훽퐷∗ 40.5% 7th 
= (훼 + 훽퐷∗)2 35.5% 8th 
= 10(훼+훽퐷∗) 31.1% 9th 
31.0% 10th 
27.1% 12th 
= 10(훼+훽푇∗) 17.5% 
Figure 2: Scatter plot matrix of the first 3 rated models. 
Ran-king 
4. Discussion of findings 
Linearity or almost linearity was ascertained between transformed variables. Table 4 shows 푅2 values, linear 
regression models and the back transformation or interpretation between the respective variables that proved to 
have linear relationship after transformation was successfully executed. Similarly, twelve (12) models out of the 
fourteen transformed linear regression models claim to have positive relationship between variables while the 
independent variables proved to be significant (at 5% significant level) to the corresponding dependent variable. 
These transformations only discovered linearity between T and D variables while relationship between other 
variables failed to claim linearity. We hope attention will be focused on this in future research. Since the result in 
table 4 are linear and significant, hence, ranking of the result was done using the values of the 푅2. Column 7 of 
table 4 shows the ranking result. The first 12 ranked models proved better with 푅2 > 23.6% which was the 
highest 푅2 value obtained between T and D in table 3. Fig. 2 shows the scatter plot matrix of the first 3 ranked 
models. This research only uses 푦 to represent the dependent variable and 푥 for the independent variable in 
table 4, it is important for researchers to note that these variables can be interchanged but with consequences on 
the back transformation of the model. To prevent the aforementioned problem of bias (Miller (1984)) from back 
transformation on the dependent variables, we recommend models 1, 7 and 11 since they do not include the 
transformation of the dependent variable.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
5. Conclusion 
In this research, we have used data transformation statistical tools in preparing mined data from online software 
repository, a case study of sourceforge.net, for researchers for subsequent analysis in improving the efficiency of 
repositories. It was established that data from online repositories disobey linearity assumptions and may not be 
significant as required for regression analysis. Hence, we recommend that researchers should study the mined 
data from online repositories before utilizing them for analysis. Combination of non-linear transformation tools 
proved to be effective in establishing linearity between repository variables. It was also established that after 
transformation, only Download total and total number of visitors that rate software on sourceforge can be linear. 
This research has provided list, in rank, of the first best 12 linear regression models for the direct use of 
researchers in future analysis. 
References 
Ahmed, E. Hassan (2008), “A road ahead for mining Software repositories”, IEEE. 
Doi:10.1109/FOSM.2008.4659248, Pp. 48-57. 
Alexa (2014) www.alexa.com. Retrieved on April 20, 2014. 
Cleveland, W.S. (1984), “Graphical methods for data presentation. Full Scale breaks dot charts multibased 
logging”, The American Statistician, Vol.38(4), Pp. 270-280. 
Howel, D.C. (2007), “Statistical methods for psychology” Belmont, C.A. Thomson Wadsworth. 6th Edition. 
Jia, Siwei and Rathi, Sarika (2008), “On predicting log-transformed linear models with heteroscedaticity”, SAS 
Global Forum, Paper 370-2008. 
Manikandan, S. (2010), “Data transformation”, J. Pharmacol Pharmocother. Jul.-Dec, Vol.1(2), 
doi:10.4103/0976-500x.72373, Pp126-127. 
Marija, J. Norusis (2004), “SPSS 12.0 Guide to Data Analysis”, Prentice hall Inc., ISBN 0-13-147886-9. 
Miller, D. (1984), “Reducing transformation bias in Curve fitting”, The American Statistician, 30(2), 124-126. 
O’Hara, Robert B. And Hotze, Johan D. (2010), “Do not log-transform Count data”, Methods in Ecology and 
Evolution, Doi:10.1111/j.2041-210x.2010.00021.x. 
Osborne, Jason (2002), “Notes on the use of data transformations”, Practical Assessment, Research and 
Evaluation. Vol.8(6). 
Tabacknick, B.G. and Fidell, L.S. (2007), “Using Multivariate Statistics”, 5th Edition, Baston, Allyn and Bacon. 
Turke,y J.W. (1977), “Exploratory data analysis”, Reading M.A, Addison-Wesley. 
Wikipedia (2014), Sourceforge, www.en.wikipedia.org/wiki/sourceforge, Retrieved on April 20, 2014. 
www.sourceforge.net, Mining Software (Okikisoft) retrieved on March 13, 2014. 
210
The IISTE is a pioneer in the Open-Access hosting service and academic event 
management. The aim of the firm is Accelerating Global Knowledge Sharing. 
More information about the firm can be found on the homepage: 
http://www.iiste.org 
CALL FOR JOURNAL PAPERS 
There are more than 30 peer-reviewed academic journals hosted under the hosting 
platform. 
Prospective authors of journals can find the submission instruction on the 
following page: http://www.iiste.org/journals/ All the journals articles are available 
online to the readers all over the world without financial, legal, or technical barriers 
other than those inseparable from gaining access to the internet itself. Paper version 
of the journals is also available upon request of readers and authors. 
MORE RESOURCES 
Book publication information: http://www.iiste.org/book/ 
IISTE Knowledge Sharing Partners 
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open 
Archives Harvester, Bielefeld Academic Search Engine, Elektronische 
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial 
Library , NewJour, Google Scholar

More Related Content

What's hot

An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataeSAT Publishing House
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesIRJET Journal
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXINGIJDKP
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...CSCJournals
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsAn Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsIJECEIAES
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...IJDKP
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisDmitry Grapov
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
 

What's hot (17)

An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction Techniques
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
M033059064
M033059064M033059064
M033059064
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsAn Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
 

Similar to Optimizing transformation for linearity between online

Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxShahbazKhan77289
 
Comparative Analysis of Classification Algorithms using Weka
Comparative Analysis of Classification Algorithms using WekaComparative Analysis of Classification Algorithms using Weka
Comparative Analysis of Classification Algorithms using Wekaijtsrd
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 
Data science in chemical manufacturing
Data science in chemical manufacturingData science in chemical manufacturing
Data science in chemical manufacturingKarthik Venkataraman
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
 
Truth Table Generator
Truth Table GeneratorTruth Table Generator
Truth Table GeneratorIRJET Journal
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineeringIJECEIAES
 
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET Journal
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
 
An empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsAn empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsBrowse Jobs
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
Application Of Extreme Value Theory To Bursts Prediction
Application Of Extreme Value Theory To Bursts PredictionApplication Of Extreme Value Theory To Bursts Prediction
Application Of Extreme Value Theory To Bursts PredictionCSCJournals
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 

Similar to Optimizing transformation for linearity between online (20)

Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
Comparative Analysis of Classification Algorithms using Weka
Comparative Analysis of Classification Algorithms using WekaComparative Analysis of Classification Algorithms using Weka
Comparative Analysis of Classification Algorithms using Weka
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Data science in chemical manufacturing
Data science in chemical manufacturingData science in chemical manufacturing
Data science in chemical manufacturing
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 
C054
C054C054
C054
 
Truth Table Generator
Truth Table GeneratorTruth Table Generator
Truth Table Generator
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineering
 
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
An empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsAn empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systems
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
Application Of Extreme Value Theory To Bursts Prediction
Application Of Extreme Value Theory To Bursts PredictionApplication Of Extreme Value Theory To Bursts Prediction
Application Of Extreme Value Theory To Bursts Prediction
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Sub1579
Sub1579Sub1579
Sub1579
 

More from Alexander Decker

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Alexander Decker
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inAlexander Decker
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesAlexander Decker
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksAlexander Decker
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dAlexander Decker
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceAlexander Decker
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifhamAlexander Decker
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaAlexander Decker
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenAlexander Decker
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksAlexander Decker
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget forAlexander Decker
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabAlexander Decker
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloudAlexander Decker
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveragedAlexander Decker
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenyaAlexander Decker
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health ofAlexander Decker
 

More from Alexander Decker (20)

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale in
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
 

Recently uploaded

Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCRsoniya singh
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 

Recently uploaded (20)

Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 

Optimizing transformation for linearity between online

  • 1. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 Optimizing Transformation for Linearity between Online Software Repository Variables. Ogunyinka, Peter I1*and Badmus, Nofiu Idowu2 1. Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Nigeria 2. Department of Statistics, Abraham Adesanya Polytechnic, Ijebu-Igbo, Nigeria * E-mail of the corresponding author: pixelgoldprod@yahoo.com Abstract Online Software Repositories (OSR) like sourceforge.net and google code contain a wealth of valuable data about software projects but these data violate the linearity and normal assumptions, hence making the data impossible for use in most statistical data analysis. To prepare these data for statistical data analysis, the data were non-linearly transformed, hence, these research established the best twelve (12) transformed model that obey linearity assumptions, higher coefficient of determination (푅2), positive and negative relationship and gained variable significance over the original data. Similarly, the back transformation or interpretation was provided about each of these twelve (12) best ranked linear models to solve the challenges of data transformation encountered by researchers. Keywords: Data transformation, Linear regression model, OkikiSoft, Online Software Repository and Sourceforge.net. 1. Introduction Online Software repository (OSR) is a web based storage of Computer software. OSR contains a wealth of valuable information about software projects. Ahmed (2008) gave types of software repository as historical repositories, run-time repositories and Code repositories. This research focuses on the code repositories (CR). CR, such as www.sourceforge.net and google code (www.code.google.com), host the source codes of various applications developed by several developers. According to Ahmed (2008), very often, data available on OSR exhibit large amount of noise and skew. The use of such data may lead to incorrect results and conclusions. Ahmed (2008) recommends that software repository researchers should closely study the noise and skew in the data and better understand the effect on the analysis. Statistical visualization is essential to spot the noise and skewness. He concluded in his recommendation that OSR researchers should provide guidelines and tools to improve the quality of repository data. This research uses mining software repository (MSR) software called OKIKISOFT. Okikisoft is the authors developed artificial intelligence software for automatic mining of data on the webpages of sourceforge.net. The statistical analysis of the data mined on the repository revealed the violation of linearity assumption between the repository variables. This violation of statistical assumption can lead to type-I or type-II error, hence, calling for data transformation for the improvement of the quality of the repository data for subsequent analysis. Why Sourceforge.net? Wikipedia (2014) established sourceforge.net as the first free web based source code repository that hosts free and open source software. Among its competitors are Github (www.github.com), Google Code (www.code.google.com) and Javaforge (www.javaforge.com). Alexa (2014) rates sourceforge as 162nd world best website and the first best repository among the aforementioned competitors. These characteristics have called the attention of this research to take a study of the repository for the hundreds of researchers and millions of visitors visiting the website. 2. Methodology Regression analysis requires the satisfaction of linearity, normality, homoscedasticity and independence. Osborne (2002) established that the violation of the conditions can increase the probability of committing type-I or type-II error. Over decades, data transformation has been recommended as the solution to non-linearity, outliers, among others. Data transformation involves using a mathematical operation to change the measurement scale of variable(s). Linear and nonlinear data transformations are the types of data transformation available. Linear transformation retains the relationship between variables while non-linear transformation changes (increases or decreases) the integrity of the relationship between variables. Tabanick and Fidell (2007) 205
  • 2. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 acknowledging the importance of transformation stated that it may not be generally acceptable by authors as a results of difficulty of interpreting the transformed variables but remains a legitimate statistical tool for the realization of linear assumption. This research intends to apply transformation technique to online software repository variables to establish relationship between these variables which was found absent in the original data. 2.1 Linear Regression and Data Transformation The interpretation of data based on the analysis of variance (ANOVA) is valid if the assumptions of normality, homogeneity, independence and addition assumptions are satisfied. Similarly, regression analysis acknowledges linearity and the first three aforementioned assumptions for implementation. O’Hara and Hotze (2010) emphases that the main purpose of data transformation is to get a sample data to conform with the assumptions of parametric statistics such as ANOVA, t-test and linear regression or to manage outliers in a dataset. Marija (2004) established that data transformation technique is neither a cheating technique nor distortion of the true picture of the data under consideration, rather, it is a legitimate statistical tool. Literatures have established that results interpretation of transformed data analysis is the major challenge facing this statistical technique. However, one added benefit about most transformation technique is that when data are transformed to meet a certain assumption, we often come closer to satisfy another assumptions as well. For instance, square root transformation may help to equate group variances by compressing the upper of the distribution end more than it compresses the lower end. It may also have effect of making a positively skewed distribution more nearly normal in shape. Howel (2007) recommended, as a solution to interpretation problem of data transformation, that researchers should look at both the transformed and original data means and make sure that they are telling the same basic story. Table 1 presents the common ways to transform variables to achieve literatures for regression analysis. Table 1: The common statistical transformation techniques SN Method Transformed Variable Regression 206 Equation Predicted/Back transformation value (풀̂ ) 01 Standard linear regression None 푌 = 푏0 + 푏1푋 푌̂ = 푏0 + 푏1푋 02 Exponential transformation Dependent variable (푙표푔10푌) 푙표푔10푌 = 푏0 + 푏1푋 푌̂ = 10(푏0+푏1푋) 03 Quadratic transformation Dependent variable (푆푞푟푡(푌)) 푆푞푟푡(푌) = 푏0 + 푏1푋 푌̂ = (푏0 + 푏1푋)2 04 Reciprocal transformation Dependent variable (푦−1) 푦−1 = 푏0 + 푏1푋 푌̂ = 1⁄(푏0 + 푏1푋) 05 Logarithm transformation Independent variable (푙표푔10푋) 푌 = 푏0 + 푏1푙표푔10푋 푌̂ = 푏0 + 푏1푙표푔10푋 06 Power transformation Dependent variable 푙표푔10푌 and independent variable 푙표푔10푋 푙표푔10푌 = 푏0 + 푏1푙표푔10푋 푌̂ = 10(푏0+푏1푙표푔10푋) SN Method Transformed Variable Regression Equation Predicted/Back transformation value (풀̂ ) 07 Square transformation Independent variable (푋2) 푌 = 푏0 + 푏1푋2 푌̂ = 푏0 + 푏1푋2 In practise, these methods need to be tested on the data to which they are applied for the confirmation that they increase rather than decrease the linearity strength of the relationship. Among the methods to detect the efficiency of the transformed data are to establish linearity, obtain the coefficient of determination (푅2) and to conduct a significant test of the independent variable on the response variable. It is expected that 푅2 of the transformed variables will be higher than the non-transformed variables and that the independent variables will be significant to the response variable. Back transformation is used to return a transformed predicted value to its original scale. Back transformation predicted values give values for the medium response but not the mean response as it is expected. Miller (1984) established that back transformation on the mean of the dependent variable results to serious bias. He, further, established a solution that minimizes the bias. Jia and Rathi (2008),
  • 3. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 confirming the bias, established a more efficient solution that almost removed the bias. Researchers are advised to consult this aforementioned literature for implementation. 2.2 Mining of data Visitors to sourceforge have the privilege to find, create, publish, rate and download free and open source software from the repository. Rating of software is done based on two criteria viz. Stars rating and laid down criteria rating. The user can rate 5,4,3,2 or 1 star(s) after which the average stars rating is computed automatically. Computation of the average rating is shown in table 2 below. Table 2: Average rating computation for filezilla on sourceforge.net as at April 26, 2014. 207 (a) Rating Star (b) Number of Raters (c=a*b) Total 5 843 4215 4 11 44 3 6 18 2 4 8 1 113 113 Total 977 4398 Average ( ퟓ풊 Σ 풄풊 =ퟏ ퟓ풊 Σ 풃풊 =ퟏ ) 4.5 Similarly, laid-down criteria including design, ease, feature and support are considered in rating software by the user. The repository takes account of total number of people that rate software and download software on it. Perhaps, knowing to the visitors, the repository system collects some information about the users’ operating system maker and users’ country locations. Manually collecting data on sourceforge can be a tedious task spanning through days and weeks depending on the volume of the data under concern. This research uses mining software repository (MSR) software named Okikisoft which was specially developed for this research purpose. Okikisoft automatically and invisibly visits the pages of sourceforge to extract the required data. It compiles the mined data into CSV files and save it in the application folder. Okikisoft can act as a server-side or client-side mining system. 3. Presentation and Analysis of the Mined Data The extracted data variables are represented as follows: D represents Download total for software, F represents Filesize of the software , R represents Average Rating for the software and T represents Total number of visitors that rate the software. Okikisoft mined 1802 software data ranging between February 1 through February 28, 2014. However, a sample size of 푛 = 50 was randomly selected for this research. The scatter plots matrix of the original data is presented in figure 1.
  • 4. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 1 푅 푇 푅 = 훼 + 훽푇 + 0.03% 푃 = 0.976: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 2 푅 퐹 푅 = 훼 + 훽퐹 + 0.02% 푃 = 0.879: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 3 푅 퐷 푅 = 훼 + 훽퐷 - 0.02% 푃 = 0.880: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 4 푇 퐹 푇 = 훼 + 훽퐹 + 0.0% 푃 = 0.992: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 5 (*) 푇 퐷 푇 = 훼 + 훽퐷 - 23.6% 푃 = 0.000: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 6 (*) 퐹 퐷 퐹 = 훼 + 훽퐷 - 21.4% 푃 = 0.001: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 208 Figure 1: Scatter plot matrix of the original data. Figure 1 reveals that none of the plots in the scatter plots matrix can be assumed to be linear. Similarly table 3 shows the analysis results done on the original data. Very low coefficient of determination (푅2) and insignificance of the independent variable were experienced between (푅 and 푇), (푅 and 퐹), (푅 and 퐷) and (푇 and 퐹) and negative relationship was experienced between (푇 and 퐷) and (퐹 and 퐷). This result coincides with Ahmed (2008) that the original online repository data may violate fundamental assumptions required by researchers, hence, we recommend that researchers should study the data before further analysis. Table 3: Analysis results of original data. Sn. 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 Linear Regression Model +/ - r 푹ퟐ 푯ퟎ: 휷 = ퟎ 휶 = ퟎ. ퟎퟓ 3.1 Data transformation and analysis To prevent values between 0 and 1 in the original data (Osborne (2002)), a linear transformation was done by adding 2 to all data in the four variables. The linearly transformed variables were further nonlinearly transformed with the seven tools viz: 푙표푔2푦, 푙표푔10푦, √푦, 푦2, √푦 3 , 푦3 and 푦−1. Table 4 shows the results of the data transformation and figure 2 shows the scatter plot matrix for the data.
  • 5. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 -/+ r 1 T 푙표푔10퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 2 푙표푔10퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 3 푙표푔10푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 4 √퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 5 푙표푔2푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 6 √퐷 = 퐷∗ 푇2 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 7 T √퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 8 √푇 = 푇∗ √퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 9 푙표푔10푇 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ - - 푇̂ 10 푇3 = 푇∗ 퐷 푇∗ = 훼 + 훽퐷 ++ 푇̂ = √(훼 + 훽퐷) 3 11 D 푇2 = 푇∗ 퐷 = 훼 + 훽푇∗ ++ 퐷̂ = 훼 + 훽푇∗ 29.0% 11th 12 푇3 = 푇∗ √퐷 3 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ = √(훼 + 훽퐷∗) 3 13 푙표푔10퐷 = 퐷∗ 푇−1 = 푇∗ 퐷∗ = 훼 + 훽푇∗ - - 퐷̂ 14 푇−1 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ = (훼 + 훽퐷∗)−1 13.9% 209 Table 4: Analysis results of transformed data. Sn 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 Linear Regre-ssion Model Back Transformation 푹ퟐ = 훼 + 훽퐷∗ 55.7% 1st = 10(훼+훽푇∗) 52.2% 2nd = 10(훼+훽퐷∗) 46.4% 3rd = (훼 + 훽푇∗)2 41.6% 4th = 2(훼+훽퐷∗) 41.2% 5th = (훼 + 훽푇∗)2 40.6% 6th = 훼 + 훽퐷∗ 40.5% 7th = (훼 + 훽퐷∗)2 35.5% 8th = 10(훼+훽퐷∗) 31.1% 9th 31.0% 10th 27.1% 12th = 10(훼+훽푇∗) 17.5% Figure 2: Scatter plot matrix of the first 3 rated models. Ran-king 4. Discussion of findings Linearity or almost linearity was ascertained between transformed variables. Table 4 shows 푅2 values, linear regression models and the back transformation or interpretation between the respective variables that proved to have linear relationship after transformation was successfully executed. Similarly, twelve (12) models out of the fourteen transformed linear regression models claim to have positive relationship between variables while the independent variables proved to be significant (at 5% significant level) to the corresponding dependent variable. These transformations only discovered linearity between T and D variables while relationship between other variables failed to claim linearity. We hope attention will be focused on this in future research. Since the result in table 4 are linear and significant, hence, ranking of the result was done using the values of the 푅2. Column 7 of table 4 shows the ranking result. The first 12 ranked models proved better with 푅2 > 23.6% which was the highest 푅2 value obtained between T and D in table 3. Fig. 2 shows the scatter plot matrix of the first 3 ranked models. This research only uses 푦 to represent the dependent variable and 푥 for the independent variable in table 4, it is important for researchers to note that these variables can be interchanged but with consequences on the back transformation of the model. To prevent the aforementioned problem of bias (Miller (1984)) from back transformation on the dependent variables, we recommend models 1, 7 and 11 since they do not include the transformation of the dependent variable.
  • 6. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 5. Conclusion In this research, we have used data transformation statistical tools in preparing mined data from online software repository, a case study of sourceforge.net, for researchers for subsequent analysis in improving the efficiency of repositories. It was established that data from online repositories disobey linearity assumptions and may not be significant as required for regression analysis. Hence, we recommend that researchers should study the mined data from online repositories before utilizing them for analysis. Combination of non-linear transformation tools proved to be effective in establishing linearity between repository variables. It was also established that after transformation, only Download total and total number of visitors that rate software on sourceforge can be linear. This research has provided list, in rank, of the first best 12 linear regression models for the direct use of researchers in future analysis. References Ahmed, E. Hassan (2008), “A road ahead for mining Software repositories”, IEEE. Doi:10.1109/FOSM.2008.4659248, Pp. 48-57. Alexa (2014) www.alexa.com. Retrieved on April 20, 2014. Cleveland, W.S. (1984), “Graphical methods for data presentation. Full Scale breaks dot charts multibased logging”, The American Statistician, Vol.38(4), Pp. 270-280. Howel, D.C. (2007), “Statistical methods for psychology” Belmont, C.A. Thomson Wadsworth. 6th Edition. Jia, Siwei and Rathi, Sarika (2008), “On predicting log-transformed linear models with heteroscedaticity”, SAS Global Forum, Paper 370-2008. Manikandan, S. (2010), “Data transformation”, J. Pharmacol Pharmocother. Jul.-Dec, Vol.1(2), doi:10.4103/0976-500x.72373, Pp126-127. Marija, J. Norusis (2004), “SPSS 12.0 Guide to Data Analysis”, Prentice hall Inc., ISBN 0-13-147886-9. Miller, D. (1984), “Reducing transformation bias in Curve fitting”, The American Statistician, 30(2), 124-126. O’Hara, Robert B. And Hotze, Johan D. (2010), “Do not log-transform Count data”, Methods in Ecology and Evolution, Doi:10.1111/j.2041-210x.2010.00021.x. Osborne, Jason (2002), “Notes on the use of data transformations”, Practical Assessment, Research and Evaluation. Vol.8(6). Tabacknick, B.G. and Fidell, L.S. (2007), “Using Multivariate Statistics”, 5th Edition, Baston, Allyn and Bacon. Turke,y J.W. (1977), “Exploratory data analysis”, Reading M.A, Addison-Wesley. Wikipedia (2014), Sourceforge, www.en.wikipedia.org/wiki/sourceforge, Retrieved on April 20, 2014. www.sourceforge.net, Mining Software (Okikisoft) retrieved on March 13, 2014. 210
  • 7. The IISTE is a pioneer in the Open-Access hosting service and academic event management. The aim of the firm is Accelerating Global Knowledge Sharing. More information about the firm can be found on the homepage: http://www.iiste.org CALL FOR JOURNAL PAPERS There are more than 30 peer-reviewed academic journals hosted under the hosting platform. Prospective authors of journals can find the submission instruction on the following page: http://www.iiste.org/journals/ All the journals articles are available online to the readers all over the world without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Paper version of the journals is also available upon request of readers and authors. MORE RESOURCES Book publication information: http://www.iiste.org/book/ IISTE Knowledge Sharing Partners EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open Archives Harvester, Bielefeld Academic Search Engine, Elektronische Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial Library , NewJour, Google Scholar