SlideShare a Scribd company logo
1 of 21
Assignment - 03
Model Building, Selection, & Prediction
Question 1:
1. Predicting the Output Variable Y – Energy Production
Prediction
a) Importing the data from CSV data and splitting into test and
training data:
Using the read.csv() function we can import the data into R
INPUT:
OUTPUT:
INPUT:
OUTPUT:
b) Fitting a Linear Regression Model:
Running the Linear Regression Model with all the Variables
INPUT:
OUTPUT:
The Adjusted R-Squared value is found to be 0.2366.
From the data It can seen that Pressure and Wind are only
significant.
So, we run the model only with wind and pressure variables.
Reduced Regression Model (Wind and Pressures Variable only)
INPUT:
OUTPUT:
Removing the Wind Variable since the Adjusted R Squared
Value is only 0.0229. Now we run the regression using only the
Pressure Variable.
Running the Regression model with only Wind Variable:
INPUT:
OUTPUT:
The Adjusted R-Squared value is found to be 0.219, which is
less than the previous regression models.
ANOVA test is to be conducted to find the significance of the
all variable included model and the reduced pressure variable
model.
INPUT:
OUTPUT:
Between the All variable and Reduced model, the P value is
found to be 0.2578, so we should not reject the Null hypothesis
and use the Reduced Model.
Between the Pressure variable and Reduced model, the P value
is found to be 0.0768, so we should not reject the Null
hypothesis and use the Pressure Model.
Running Best Subset to find the model:
Best Subset find the value of statistics for all variables involved
and print the statistics for comparison, using which we can
select the appropriate variable
INPUT:
OUTPUT:
RSS Value decrease as the variable increase.
Model with 5 variable has the highest Adjusted R Square.
Model with 3 variable has the smallest AIC (or Cp).
Model with 8 variable has the smallest BIC.
Since the Bestsubset approach provides a broad result we check
the predicted R square and use the model with highest R square
and lower RMSE
R square and RMSE Prediction:
For all variable considered Model:
INPUT:
OUTPUT:
For the Reduced Model with Pressure and Wind Variables:
INPUT:
OUTPUT:
Single Model with Pressure as the dependent variable:
INPUT:
OUTPUT:
Summary:
From the Analysis we can conclude that model with the pressure
as the dependent variable is better than the other models. The
Adjusted R square value of 0.31 is the best and the RMSE value
is also the least in case of the pressur model.
From the Adjusted R Squared value we conclude that the
pressure model is the best and can predict the energy produced
rate accurately for 31% of the data.
c) Backward Selection Approach:
Regression Model using all the variables:
INPUT:
OUTPUT:
Conclusion:
The backward step AIC function tells a slightly different result
then the models generated above. However, when we create the
regression model we see a low R2 value then our single model.
Below, we can compare all the 3 models above with this step
model.
2
Final Project
ALY-6015 Week 6 Project
Intermediate Analytics
Submitted to:Ani Aghababyan
College of Professional Studies
Northeastern University, MA
Submitted by:
Vikrant Kakad
Vikas Warudkar
Sunita Mohapatra
Darshan Shah
Akshay kannan
Academic Term Spring 2018 - Quarter 2
Introduction
Wine making is affected by a series of variables, when it is
made. Several variables from alcohol, to pH can affect the final
results. It is crucial to understand and learn how these variables
impact the quality of red wine. The scope of this project work is
to understand effect of various attributes which impact the
quality of the Red wine. The data set utilized for the analysis is
downloaded from UCI repository. The analysis has additional
focus on the following key parameters:
pH value - pH value is considered to be a key parameter for the
determination of quality of wine and hence the analysis focused
on determining the impact of these pH values on final quality
determination.
SO2 values (Free and Total) - SO2 has been always a debatable
topic due to the allergic reactions associated with SO2.The
current analysis tries to determine the impact of SO2 on pH
values and the final quality values for the wine samples.
Alcohol content - Alcohol content is an important parameter
considered when a buyer purchases any alcoholic product and
this analysis tries to unravel relationship of Alcohol content
with parameters like pH values and SO2 contents and the impact
to quality.
In this project, we did the analysis of Red Wine Data and try to
understand which variables are responsible for the quality of the
wine. First, we got the feel of the variables on their own and
then we found out the correlation between them and the wine
quality with other factors thrown in. Finally, we created a linear
model to predict the outcome of a test set data.
Proposing supervised learning approach to predict human wine
taste preferences that is based on easily available analytical
tests at the certification step. A large dataset (when compared to
other studies in this domain) is considered, with red Vinho
Verde samples from Portugal (CVRVV, 2008). Two regression
techniques were applied, under a computationally efficient
procedure that performs simultaneous variable and model
selection. The support vector machine achieved promising
results, outperforming the multiple regression and neural
network methods. Such model is useful to support the
oenologist wine tasting evaluations and improve wine
production. Furthermore, similar techniques can help in target
marketing by modeling consumer tastes from niche markets.
Research Question
By performing this analysis, we seek to answer the following
questions:
1. How is the quality of the wines tasted?
2. What is the minimum set of properties and their values that
defines a high-quality wine?
3. What are considered wine defects?
About dataset
· Name: Red Wine Quality Data Set
· Sources Created by: Paulo Cortez (Univ. Minho), Antonio
Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis
(CVRVV, 2009)
· Input variables:
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
· Output variable: quality (score between 0 and 10)
· Data Set Characteristics: Multivariate
· Number of Observations: 1599
· Number of Attributes: 12
· Missing Values: N/A
Description of attributes:
1. Fixed acidity: Most acids involved with wine or fixed or
nonvolatile (do not evaporate readily)
2. Volatile acidity: The amount of acetic acid in wine, which at
too high of levels can lead to an unpleasant, vinegar taste
3. Citric acid: Found in small quantities, citric acid can add
'freshness' and flavor to wines
4. Residual sugar: The amount of sugar remaining after
fermentation stops, it's rare to find wines with less than 1
gram/liter and wines with greater than 45 grams/liter are
considered sweet
5. Chlorides: The amount of salt in the wine
6. Free sulfur dioxide: The free form of SO2 exists in
equilibrium between molecular SO2 (as a dissolved gas) and
bisulfite ion; it prevents microbial growth and the oxidation of
wine
7. Total sulfur dioxide: Amount of free and bound forms of S02;
in low concentrations, SO2 is mostly undetectable in wine, but
at free SO2 concentrations over 50 ppm, SO2 becomes evident
in the nose and taste of wine
8. Density: The density of water is close to that of water
depending on the percent alcohol and sugar content
9. pH: Describes how acidic or basic a wine is on a scale from 0
(very acidic) to 14 (very basic); most wines are between 3-4 on
the pH scale
10. Sulphates: A wine additive which can contribute to sulfur
dioxide gas (S02) levels, which acts as an antimicrobial and
antioxidant
11. Alcohol: the percent alcohol content of the wine
12. Quality: output variable (based on sensory data, score
between 0 and 10)
The dataset chosen has the following above attributes and it
delivers a better result in detecting the quality after testing. The
datatypes of the aforementioned attributes are as follows.
As described before, there are 1599 observations (rows) for 12
different variables (columns). Quality is type of ‘ordered,
categorical, discrete’ variable, whose value ranges from 3-8.
A statistical description of the above dataset would provide a
more coherent picture as to how the numerical values are
distributed across the dataset (Range, Quartiles, Central
Tendencies, etc). They are as follows:
The overall summary of the dataset covers all the above
information, and presents the data in a concise & lucid way.
They can be shown as follows:
Methods chosen:
Univariate Plot Analysis:
A univariate plot shows the data and summarizes its
distribution. A dot plot, also known as a strip plot, shows the
individual observations. A box plot shows the five-
number summary of the data – the minimum, first quartile,
median, third quartile, and maximum.
The graph analysis is as follows :-
Here, it can be observed that the density, pH value and wine
quality appears to be normally distributed. Fixed, Volatile
acidity & Sulphur dioxides, Sulphates and alcohol seems to be
long tailed. Qualitatively, residual sugar and chlorides have
extreme outliers. Citric acid appeared to have a large number of
zero values. This might be a case of non-reporting.
Exploratory Data Analysis (EDA) and Data Pre-
processingHistograms to show the distribution of the variable
values. As we could clearly see, citric acid was one feature that
was found to be not normally distributed on a logarithmic scale.
Now, a combined variable namely “TAC.acidity” is created
that constitutes the sum of Tartaric, acetic & citric acid. It is as
follows :-
Boxplots for each of the variables as another indicator of
spread.
Observations regarding variables: All variables have
outliers
· Acidities like Citric acid, Volatile acidity and Fixed acidity
data have critical outliers present. If these outliers are removed,
then the distribution of these attributes can become symmetric.
· Positively Skewed Distribution is shown by the residual sugar
in the wine, interesting fact here is that even if we ignore the
outliers, this skewness remains unaffected.
· Attributes/variables like Density of wine, Free Sulphur
Dioxide have significant outliers, but they are very different
from the rest.
· Larger side of the data has most of the outliers.
· Irregular distribution is shown by the alcohol content of the
red wine without any major outliers.
Support vector machines are a class of factual models initially
created in the mid-1960s by Vladimir Vapnik. In later years, the
model has advanced extensively into a standout amongst the
most adaptable and powerful machine learning instruments
accessible. It is a regulated learning calculation which can be
utilized to tackle both characterization and relapse issue, even
though the present spotlight is on grouping as it were. To place
it, this calculation searches for a straightly distinguishable
hyperplane, or a choice limit isolating individual from one class
from the other. If such a hyperplane exists, the work is finished!
If such a hyperplane does not exist, SVM utilizes a nonlinear
mapping to change the preparation information into a higher
measurement. At that point it scans for the straight ideal
isolating hyperplane. With a fitting nonlinear mapping to an
adequately high measurement, information from two classes can
simply be isolated by a hyperplane. The SVM calculation
discovers this hyperplane utilizing support vectors and edges.
As a preparation calculation, SVM may not be quick contrasted
with some other grouping techniques, however inferable from
its capacity to display complex nonlinear limits, SVM has high
precision. SVM is relatively less inclined to overfitting. SVM
has effectively been connected to manually written digit
acknowledgment, content arrangement, speaker distinguishing
proof and so forth. The utilization of this procedure helped us to
recognize the correct closer sum and incentive through relapses
and definitions.
Results and Findings
A correlation of each variable has been made against the wine
quality to determine those factors which comparatively have a
better influence in the quality of wine. It was found that the top
4 variables that influence the wine quality are as follows :-
1) alcohol
2) sulphates (log10)
3) volatile acidity
4) citric acid
The following was done to examine the acidity variables.
Of all the other factors, base 10 logarithm TAC.acidity
correlated very well with Ph, and rightfully so, since pH is a
defining measure of acidity.
An interesting question to pose, using basic chemistry
knowledge, is to ask what other components other than the
measured acids are affecting pH.
We can quantify this difference by building a predictive linear
model, to predict pH based off of TAC.acidity and capture the
% difference as a new variable.
Conclusion
By examining the above information, we could locate the
administered learning strategy called bolster vector machine
anticipated the essence of the red wine quality and gave us the
outcome for more wine quality is specifically corresponding to
the liquor content. Although alternate systems were in the same
class as this above technique yet it helped us to discover the
guess result and we could foresee the quality through the
measure of liquor content. The use of this investigation can
comprehend whether by adjusting the factors, it is conceivable
to build the nature of the wine available. In the event that you
can control your factors, at that point you can foresee the nature
of your wine and acquire more benefits.
As observed, the direct model and the Support Vector Machine.
The SVM performed imperceptibly better and we chose to stay
with it on the off chance that we needed to make any more
expectations. The use of this investigation, can comprehend
whether by altering the factors amid wine making, it is
conceivable to expand the nature of the wine available. In the
event that you can control your factors, at that point you can
anticipate the nature of your wine and acquire more benefits.
References
CVRVV. 2008. Portuguese Wine — Vinho Verde. Comissão de
Viticultura da Região dos Vinhos Verdes (CVRVV),
http://www.vinhoverde.pt.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 2009.
Modeling wine preferences by data mining from
physicochemical properties. In Decision Support Systems,
Elsevier, 47(4):547-553.
V. Cherkassy, Y. Ma. 2004. Practical selection of SVM
parameters and noise estimation for SVM regression. Neural
Networks, 17 (1), pp. 113-126
Red Wine Quality. 2018. Kaggle,
https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-
2009/data
Appendix A: (R-Script)
Red Wine quality assessment
===============================================
=========
```{r echo=FALSE, message=FALSE, warning=FALSE}
# install.packages("MASS")
# install.packages("gridExtra")
# install.packages("grid")
# install.packages("ggplot2")
# install.packages("lattice")
# install.packages("dplyr")
# install.packages("memisc")
# install.packages("GGally")
# install.packages("reshape2")
# install.packages("kernlab")
# install.packages("plyr")
# install.packages("plotly")
# install.packages("e1071")
require(MASS)
require(gridExtra)
library(grid)
library(ggplot2)
library(lattice)
require(dplyr)
require(memisc)
require(GGally)
require(reshape2)
require(kernlab)
update.packages("ggplot2")
library(plyr)
library(plotly)
library(skimr)
```
Read csv file and explore statistics
```{r echo=FALSE,message=FALSE,warning=FALSE}
Wine <- read.csv("https://archive.ics.uci.edu/ml/machine-
learning-databases/wine-quality/winequality-red.csv", sep = ";")
str(Wine)
summary(Wine)
skim(Wine)
Wine$quality <- as.numeric(Wine$quality)
```
Creates tabular results of categorical variables
```{r,message=FALSE,warning=FALSE}
table(Wine$quality)
```
# Univariate Plots Section
```{r echo=FALSE,message=FALSE,warning=FALSE}
grid.arrange(qplot(Wine$fixed.acidity),
qplot(Wine$volatile.acidity),
qplot(Wine$citric.acid),
qplot(Wine$residual.sugar),
qplot(Wine$chlorides),
qplot(Wine$free.sulfur.dioxide),
qplot(Wine$total.sulfur.dioxide),
qplot(Wine$density),
qplot(Wine$pH),
qplot(Wine$sulphates),
qplot(Wine$alcohol),
qplot(Wine$quality),
ncol = 4)
```
# Univariate Analysis
1. Wine Quality forms a normal distribution.
2. Density and pH are normally distributed with a few outliers.
Create new variable for better exploration
```{r,message=FALSE,warning=FALSE}
Wine$rating <- ifelse(Wine$quality < 5, 'bad', ifelse(
Wine$quality < 7, 'average', 'good'))
Wine$rating <- ordered(Wine$rating,
levels = c('bad', 'average', 'good'))
summary(Wine$rating)
```
Create Histogram of log function of the variables for further
analysis
```{r,message=FALSE,warning=FALSE}
ggplot(Wine,aes(x=fixed.acidity))+geom_histogram(fill='red')+s
cale_x_log10(breaks=4:15)+
xlab('Fixed Acidity')+ylab('Count')+ggtitle('Histogram of
Fixed Acidity Values')
require(plotly)
ggplot()
plot_ly(data=Wine,x=~citric.acid,type='histogram')
ggplot(Wine) +
geom_histogram(aes(x=volatile.acidity),fill='blue')+
scale_x_log10(breaks=seq(0.1,1,0.1))
ggplot(Wine) +
geom_histogram(aes(x=citric.acid),fill='green') +
scale_x_log10()
```
Citric acid was one feature that was found to be not
normally distributed on a logarithmic scale.
Create a combined variable,
TAC.acidity, containing the sum of tartaric, acetic, and citric
acid.
```{r,message=FALSE,warning=FALSE}
Wine$TAC.acidity <- Wine$fixed.acidity +
Wine$volatile.acidity +
Wine$citric.acid
qplot(Wine$TAC.acidity,main = 'Histogram of TAC Acidity
(fixed+volatile+Citric)')
```
## Boxplots are better suited in visualizing the outliers.
```{r,message=FALSE,warning=FALSE}
get_simple_boxplot <- function(column, ylab) {
return(qplot(data = Wine, x = 'simple',
y = column, geom = 'boxplot',
xlab = '',
ylab = ylab))
}
grid.arrange(get_simple_boxplot(Wine$fixed.acidity, 'fixed
acidity'),
get_simple_boxplot(Wine$volatile.acidity, 'volatile
acidity'),
get_simple_boxplot(Wine$citric.acid, 'citric acid'),
get_simple_boxplot(Wine$TAC.acidity, 'TAC acidity'),
get_simple_boxplot(Wine$residual.sugar, 'residual
sugar'),
get_simple_boxplot(Wine$chlorides, 'chlorides'),
get_simple_boxplot(Wine$free.sulfur.dioxide, 'free
sulf. dioxide'),
get_simple_boxplot(Wine$total.sulfur.dioxide, 'total
sulf. dioxide'),
get_simple_boxplot(Wine$density, 'density'),
get_simple_boxplot(Wine$pH, 'pH'),
get_simple_boxplot(Wine$sulphates, 'sulphates'),
get_simple_boxplot(Wine$alcohol, 'alcohol'),
ncol = 4)
plot_ly(Wine,y=~alcohol,type='box')
```
# Bivariate Plots Section
```{r echo=FALSE,message=FALSE,warning=FALSE}
get_bivariate_boxplot <- function(x, y, ylab) {
return(qplot(data = Wine, x = x, y = y, geom = 'boxplot', ylab
= ylab))
}
grid.arrange(get_bivariate_boxplot(Wine$quality,
Wine$fixed.acidity,
'fixed acidity'),
get_bivariate_boxplot(Wine$quality,
Wine$volatile.acidity,
'volatile acidity'),
get_bivariate_boxplot(Wine$quality, Wine$citric.acid,
'citric acid'),
get_bivariate_boxplot(Wine$quality,
Wine$TAC.acidity,
'TAC acidity'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$residual.sugar),
'residual sugar'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$chlorides),
'chlorides'),
get_bivariate_boxplot(Wine$quality,
Wine$free.sulfur.dioxide,
'free sulf. dioxide'),
get_bivariate_boxplot(Wine$quality,
Wine$total.sulfur.dioxide,
'total sulf. dioxide'),
get_bivariate_boxplot(Wine$quality, Wine$density,
'density'),
get_bivariate_boxplot(Wine$quality, Wine$pH,
'pH'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$sulphates),
'sulphates'),
get_bivariate_boxplot(Wine$quality, Wine$alcohol,
'alcohol'),
ncol = 4)
```
Correlation for each of these
variables against quality:
```{r,message=FALSE,warning=FALSE}
simple_cor_test <- function(x, y) {
return(cor.test(x, as.numeric(y))$estimate)
}
correlations <- c(
simple_cor_test(Wine$fixed.acidity, Wine$quality),
simple_cor_test(Wine$volatile.acidity, Wine$quality),
simple_cor_test(Wine$citric.acid, Wine$quality),
simple_cor_test(Wine$TAC.acidity, Wine$quality),
simple_cor_test(log10(Wine$residual.sugar), Wine$quality),
simple_cor_test(log10(Wine$chlorides), Wine$quality),
simple_cor_test(Wine$free.sulfur.dioxide, Wine$quality),
simple_cor_test(Wine$total.sulfur.dioxide, Wine$quality),
simple_cor_test(Wine$density, Wine$quality),
simple_cor_test(Wine$pH, Wine$quality),
simple_cor_test(log10(Wine$sulphates), Wine$quality),
simple_cor_test(Wine$alcohol, Wine$quality))
correlations
names(correlations) <- c('fixed.acidity', 'volatile.acidity',
'citric.acid',
'TAC.acidity', 'log10.residual.sugar',
'log10.chlordies', 'free.sulfur.dioxide',
'total.sulfur.dioxide', 'density', 'pH',
'log10.sulphates', 'alcohol')
correlations
```
Top 4:
alcohol
sulphates (log10)
volatile acidity
citric acid
Examining the acidity variables:
```{r,message=FALSE,warning=FALSE}
ggplot(data = Wine, aes(x = fixed.acidity, y = citric.acid)) +
geom_point(alpha=0.3)
cor.test(Wine$fixed.acidity, Wine$citric.acid)
ggplot(data = Wine, aes(x = volatile.acidity, y = citric.acid)) +
geom_point(alpha=0.3)
cor.test(Wine$volatile.acidity, Wine$citric.acid)
ggplot(data = Wine, aes(x = log10(TAC.acidity), y = pH)) +
geom_point(alpha=0.3)
cor.test(log10(Wine$TAC.acidity), Wine$pH)
```
Base 10 logarithm TAC.acidity correlated very well with pH.
Building a predictive linear model,
to predict pH based off of TAC.acidity and
capture the % difference as a new variable.
```{r,message=FALSE,warning=FALSE}
m <- lm(I(pH) ~ I(log10(TAC.acidity)), data = Wine)
Wine$pH.predictions <- predict(m, Wine)
# (observed - expected) / expected
Wine$pH.error <- (Wine$pH.predictions - Wine$pH)/Wine$pH
```
To check its accuracy.
The RMS Error.
```{r,message=FALSE,warning=FALSE}
rmse <- function(error)
{
sqrt(mean(error^2))
}
rmse(m$residuals)
#Now, we train a Support Vector Machine.
require(e1071)
SVM <- svm(I(pH) ~ I(log10(TAC.acidity)), data = Wine)
Wine$pH.Predict.SVM <- predict(SVM,Wine)
Wine$pH.error.SVM <- (Wine$pH.Predict.SVM -
Wine$pH)/Wine$pH
rmse(SVM$residuals)
```
SVM functions slightly better than a LM.
### Plot 1: Effect of Alcohol on Wine Quality
```{r echo=FALSE,message=FALSE,warning=FALSE}
ggplot(data = Wine, aes(x = quality, y = alcohol,
fill = rating)) +
geom_boxplot(outlier.color = 'red') +
ggtitle('Alcohol Levels in Different Wine Qualities') +
xlab('Quality') +
ylab('Alcohol (% volume)')
```
### Description 1
These boxplots demonstrate the effect of alcohol content on
wine quality.
Generally, higher alcohol content correlated with higher wine
quality.
However, as the outliers and intervals show, alchol content
alone did not
produce a higher quality.
13

More Related Content

Similar to Assignment - 03Model Building, Selection, & Prediction.docx

Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Shimadzu Scientific Instruments
 
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Shimadzu Scientific Instruments
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00Vijay Dhonde
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsABHISHEKDAHALE
 
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMRPANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMRJohn Edwards
 
Qualification of instrumets
Qualification of instrumetsQualification of instrumets
Qualification of instrumetsChowdaryPavani
 
Air pollution meter
Air pollution meterAir pollution meter
Air pollution meterAyuenNyiel
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 ASAnkur Sansanwal
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdfEmerson Ceras
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
 
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...PECB
 
Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)PRAVADA
 

Similar to Assignment - 03Model Building, Selection, & Prediction.docx (20)

Blood Gas Analyzer
Blood Gas AnalyzerBlood Gas Analyzer
Blood Gas Analyzer
 
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
 
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00
 
Wine.Final.Project.MJv3
Wine.Final.Project.MJv3Wine.Final.Project.MJv3
Wine.Final.Project.MJv3
 
Tools of the Trade
Tools of the TradeTools of the Trade
Tools of the Trade
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMRPANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
 
Qualification of instrumets
Qualification of instrumetsQualification of instrumets
Qualification of instrumets
 
Air pollution meter
Air pollution meterAir pollution meter
Air pollution meter
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 AS
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdf
 
GRPE-76-18e.pptx
GRPE-76-18e.pptxGRPE-76-18e.pptx
GRPE-76-18e.pptx
 
pdf.pdf
pdf.pdfpdf.pdf
pdf.pdf
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
Group5
Group5Group5
Group5
 
Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)
 
Artificial Nose-Presentation.pdf
Artificial Nose-Presentation.pdfArtificial Nose-Presentation.pdf
Artificial Nose-Presentation.pdf
 

More from festockton

Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docx
Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docxLearning ResourcesRequired ReadingsToseland, R. W., & Ri.docx
Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docxfestockton
 
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docx
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docxLeamosEscribamos Completa el párrafo con las formas correctas de lo.docx
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docxfestockton
 
Leadership via vision is necessary for success. Discuss in detail .docx
Leadership via vision is necessary for success. Discuss in detail .docxLeadership via vision is necessary for success. Discuss in detail .docx
Leadership via vision is necessary for success. Discuss in detail .docxfestockton
 
Learning about Language by Observing and ListeningThe real.docx
Learning about Language by Observing and ListeningThe real.docxLearning about Language by Observing and ListeningThe real.docx
Learning about Language by Observing and ListeningThe real.docxfestockton
 
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docx
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docxLearning Accomplishment Profile-Diagnostic Spanish Language Edit.docx
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docxfestockton
 
Learning about Language by Observing and ListeningThe real voy.docx
Learning about Language by Observing and ListeningThe real voy.docxLearning about Language by Observing and ListeningThe real voy.docx
Learning about Language by Observing and ListeningThe real voy.docxfestockton
 
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docx
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docxLEARNING OUTCOMES1. Have knowledge and understanding of the pri.docx
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docxfestockton
 
Leadership Style What do people do when they are leadingAssignme.docx
Leadership Style What do people do when they are leadingAssignme.docxLeadership Style What do people do when they are leadingAssignme.docx
Leadership Style What do people do when they are leadingAssignme.docxfestockton
 
Leadership Throughout HistoryHistory is filled with tales of leade.docx
Leadership Throughout HistoryHistory is filled with tales of leade.docxLeadership Throughout HistoryHistory is filled with tales of leade.docx
Leadership Throughout HistoryHistory is filled with tales of leade.docxfestockton
 
Lean Inventory Management1. Why do you think lean inventory manage.docx
Lean Inventory Management1. Why do you think lean inventory manage.docxLean Inventory Management1. Why do you think lean inventory manage.docx
Lean Inventory Management1. Why do you think lean inventory manage.docxfestockton
 
Leadership varies widely by culture and personality. An internationa.docx
Leadership varies widely by culture and personality. An internationa.docxLeadership varies widely by culture and personality. An internationa.docx
Leadership varies widely by culture and personality. An internationa.docxfestockton
 
Leadership is the ability to influence people toward the attainment .docx
Leadership is the ability to influence people toward the attainment .docxLeadership is the ability to influence people toward the attainment .docx
Leadership is the ability to influence people toward the attainment .docxfestockton
 
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docx
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docxLawday. Court of Brightwaltham holden on Monday next after Ascension.docx
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docxfestockton
 
law43665_fm_i-xx i 010719 1032 AMStakeholders, Eth.docx
law43665_fm_i-xx i 010719  1032 AMStakeholders, Eth.docxlaw43665_fm_i-xx i 010719  1032 AMStakeholders, Eth.docx
law43665_fm_i-xx i 010719 1032 AMStakeholders, Eth.docxfestockton
 
Leaders face many hurdles when leading in multiple countries. There .docx
Leaders face many hurdles when leading in multiple countries. There .docxLeaders face many hurdles when leading in multiple countries. There .docx
Leaders face many hurdles when leading in multiple countries. There .docxfestockton
 
Last year Angelina Jolie had a double mastectomy because of re.docx
Last year Angelina Jolie had a double mastectomy because of re.docxLast year Angelina Jolie had a double mastectomy because of re.docx
Last year Angelina Jolie had a double mastectomy because of re.docxfestockton
 
Leaders face many hurdles when leading in multiple countries. Ther.docx
Leaders face many hurdles when leading in multiple countries. Ther.docxLeaders face many hurdles when leading in multiple countries. Ther.docx
Leaders face many hurdles when leading in multiple countries. Ther.docxfestockton
 
Leaders today must be able to create a compelling vision for the org.docx
Leaders today must be able to create a compelling vision for the org.docxLeaders today must be able to create a compelling vision for the org.docx
Leaders today must be able to create a compelling vision for the org.docxfestockton
 
Law enforcement professionals and investigators use digital fore.docx
Law enforcement professionals and investigators use digital fore.docxLaw enforcement professionals and investigators use digital fore.docx
Law enforcement professionals and investigators use digital fore.docxfestockton
 
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docx
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docxLAW and Economics 4 questionsLaw And EconomicsTextsCoote.docx
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docxfestockton
 

More from festockton (20)

Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docx
Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docxLearning ResourcesRequired ReadingsToseland, R. W., & Ri.docx
Learning ResourcesRequired ReadingsToseland, R. W., & Ri.docx
 
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docx
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docxLeamosEscribamos Completa el párrafo con las formas correctas de lo.docx
LeamosEscribamos Completa el párrafo con las formas correctas de lo.docx
 
Leadership via vision is necessary for success. Discuss in detail .docx
Leadership via vision is necessary for success. Discuss in detail .docxLeadership via vision is necessary for success. Discuss in detail .docx
Leadership via vision is necessary for success. Discuss in detail .docx
 
Learning about Language by Observing and ListeningThe real.docx
Learning about Language by Observing and ListeningThe real.docxLearning about Language by Observing and ListeningThe real.docx
Learning about Language by Observing and ListeningThe real.docx
 
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docx
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docxLearning Accomplishment Profile-Diagnostic Spanish Language Edit.docx
Learning Accomplishment Profile-Diagnostic Spanish Language Edit.docx
 
Learning about Language by Observing and ListeningThe real voy.docx
Learning about Language by Observing and ListeningThe real voy.docxLearning about Language by Observing and ListeningThe real voy.docx
Learning about Language by Observing and ListeningThe real voy.docx
 
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docx
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docxLEARNING OUTCOMES1. Have knowledge and understanding of the pri.docx
LEARNING OUTCOMES1. Have knowledge and understanding of the pri.docx
 
Leadership Style What do people do when they are leadingAssignme.docx
Leadership Style What do people do when they are leadingAssignme.docxLeadership Style What do people do when they are leadingAssignme.docx
Leadership Style What do people do when they are leadingAssignme.docx
 
Leadership Throughout HistoryHistory is filled with tales of leade.docx
Leadership Throughout HistoryHistory is filled with tales of leade.docxLeadership Throughout HistoryHistory is filled with tales of leade.docx
Leadership Throughout HistoryHistory is filled with tales of leade.docx
 
Lean Inventory Management1. Why do you think lean inventory manage.docx
Lean Inventory Management1. Why do you think lean inventory manage.docxLean Inventory Management1. Why do you think lean inventory manage.docx
Lean Inventory Management1. Why do you think lean inventory manage.docx
 
Leadership varies widely by culture and personality. An internationa.docx
Leadership varies widely by culture and personality. An internationa.docxLeadership varies widely by culture and personality. An internationa.docx
Leadership varies widely by culture and personality. An internationa.docx
 
Leadership is the ability to influence people toward the attainment .docx
Leadership is the ability to influence people toward the attainment .docxLeadership is the ability to influence people toward the attainment .docx
Leadership is the ability to influence people toward the attainment .docx
 
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docx
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docxLawday. Court of Brightwaltham holden on Monday next after Ascension.docx
Lawday. Court of Brightwaltham holden on Monday next after Ascension.docx
 
law43665_fm_i-xx i 010719 1032 AMStakeholders, Eth.docx
law43665_fm_i-xx i 010719  1032 AMStakeholders, Eth.docxlaw43665_fm_i-xx i 010719  1032 AMStakeholders, Eth.docx
law43665_fm_i-xx i 010719 1032 AMStakeholders, Eth.docx
 
Leaders face many hurdles when leading in multiple countries. There .docx
Leaders face many hurdles when leading in multiple countries. There .docxLeaders face many hurdles when leading in multiple countries. There .docx
Leaders face many hurdles when leading in multiple countries. There .docx
 
Last year Angelina Jolie had a double mastectomy because of re.docx
Last year Angelina Jolie had a double mastectomy because of re.docxLast year Angelina Jolie had a double mastectomy because of re.docx
Last year Angelina Jolie had a double mastectomy because of re.docx
 
Leaders face many hurdles when leading in multiple countries. Ther.docx
Leaders face many hurdles when leading in multiple countries. Ther.docxLeaders face many hurdles when leading in multiple countries. Ther.docx
Leaders face many hurdles when leading in multiple countries. Ther.docx
 
Leaders today must be able to create a compelling vision for the org.docx
Leaders today must be able to create a compelling vision for the org.docxLeaders today must be able to create a compelling vision for the org.docx
Leaders today must be able to create a compelling vision for the org.docx
 
Law enforcement professionals and investigators use digital fore.docx
Law enforcement professionals and investigators use digital fore.docxLaw enforcement professionals and investigators use digital fore.docx
Law enforcement professionals and investigators use digital fore.docx
 
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docx
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docxLAW and Economics 4 questionsLaw And EconomicsTextsCoote.docx
LAW and Economics 4 questionsLaw And EconomicsTextsCoote.docx
 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Assignment - 03Model Building, Selection, & Prediction.docx

  • 1. Assignment - 03 Model Building, Selection, & Prediction Question 1: 1. Predicting the Output Variable Y – Energy Production Prediction a) Importing the data from CSV data and splitting into test and training data: Using the read.csv() function we can import the data into R INPUT:
  • 2. OUTPUT: INPUT: OUTPUT: b) Fitting a Linear Regression Model: Running the Linear Regression Model with all the Variables INPUT: OUTPUT: The Adjusted R-Squared value is found to be 0.2366. From the data It can seen that Pressure and Wind are only significant. So, we run the model only with wind and pressure variables. Reduced Regression Model (Wind and Pressures Variable only) INPUT:
  • 3. OUTPUT: Removing the Wind Variable since the Adjusted R Squared Value is only 0.0229. Now we run the regression using only the Pressure Variable. Running the Regression model with only Wind Variable: INPUT: OUTPUT: The Adjusted R-Squared value is found to be 0.219, which is less than the previous regression models. ANOVA test is to be conducted to find the significance of the all variable included model and the reduced pressure variable model. INPUT: OUTPUT: Between the All variable and Reduced model, the P value is found to be 0.2578, so we should not reject the Null hypothesis and use the Reduced Model. Between the Pressure variable and Reduced model, the P value is found to be 0.0768, so we should not reject the Null
  • 4. hypothesis and use the Pressure Model. Running Best Subset to find the model: Best Subset find the value of statistics for all variables involved and print the statistics for comparison, using which we can select the appropriate variable INPUT: OUTPUT: RSS Value decrease as the variable increase. Model with 5 variable has the highest Adjusted R Square. Model with 3 variable has the smallest AIC (or Cp). Model with 8 variable has the smallest BIC. Since the Bestsubset approach provides a broad result we check the predicted R square and use the model with highest R square and lower RMSE R square and RMSE Prediction: For all variable considered Model: INPUT: OUTPUT: For the Reduced Model with Pressure and Wind Variables: INPUT:
  • 5. OUTPUT: Single Model with Pressure as the dependent variable: INPUT: OUTPUT: Summary: From the Analysis we can conclude that model with the pressure as the dependent variable is better than the other models. The Adjusted R square value of 0.31 is the best and the RMSE value is also the least in case of the pressur model. From the Adjusted R Squared value we conclude that the pressure model is the best and can predict the energy produced rate accurately for 31% of the data. c) Backward Selection Approach: Regression Model using all the variables: INPUT: OUTPUT:
  • 6. Conclusion: The backward step AIC function tells a slightly different result then the models generated above. However, when we create the regression model we see a low R2 value then our single model. Below, we can compare all the 3 models above with this step model. 2 Final Project ALY-6015 Week 6 Project Intermediate Analytics Submitted to:Ani Aghababyan College of Professional Studies Northeastern University, MA Submitted by: Vikrant Kakad Vikas Warudkar Sunita Mohapatra Darshan Shah Akshay kannan Academic Term Spring 2018 - Quarter 2 Introduction Wine making is affected by a series of variables, when it is made. Several variables from alcohol, to pH can affect the final
  • 7. results. It is crucial to understand and learn how these variables impact the quality of red wine. The scope of this project work is to understand effect of various attributes which impact the quality of the Red wine. The data set utilized for the analysis is downloaded from UCI repository. The analysis has additional focus on the following key parameters: pH value - pH value is considered to be a key parameter for the determination of quality of wine and hence the analysis focused on determining the impact of these pH values on final quality determination. SO2 values (Free and Total) - SO2 has been always a debatable topic due to the allergic reactions associated with SO2.The current analysis tries to determine the impact of SO2 on pH values and the final quality values for the wine samples. Alcohol content - Alcohol content is an important parameter considered when a buyer purchases any alcoholic product and this analysis tries to unravel relationship of Alcohol content with parameters like pH values and SO2 contents and the impact to quality. In this project, we did the analysis of Red Wine Data and try to understand which variables are responsible for the quality of the wine. First, we got the feel of the variables on their own and then we found out the correlation between them and the wine quality with other factors thrown in. Finally, we created a linear model to predict the outcome of a test set data. Proposing supervised learning approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A large dataset (when compared to other studies in this domain) is considered, with red Vinho Verde samples from Portugal (CVRVV, 2008). Two regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful to support the oenologist wine tasting evaluations and improve wine
  • 8. production. Furthermore, similar techniques can help in target marketing by modeling consumer tastes from niche markets. Research Question By performing this analysis, we seek to answer the following questions: 1. How is the quality of the wines tasted? 2. What is the minimum set of properties and their values that defines a high-quality wine? 3. What are considered wine defects? About dataset · Name: Red Wine Quality Data Set · Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV, 2009) · Input variables: 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol · Output variable: quality (score between 0 and 10) · Data Set Characteristics: Multivariate · Number of Observations: 1599 · Number of Attributes: 12 · Missing Values: N/A
  • 9. Description of attributes: 1. Fixed acidity: Most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. Volatile acidity: The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. Citric acid: Found in small quantities, citric acid can add 'freshness' and flavor to wines 4. Residual sugar: The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. Chlorides: The amount of salt in the wine 6. Free sulfur dioxide: The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. Total sulfur dioxide: Amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. Density: The density of water is close to that of water depending on the percent alcohol and sugar content 9. pH: Describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. Sulphates: A wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant 11. Alcohol: the percent alcohol content of the wine 12. Quality: output variable (based on sensory data, score between 0 and 10) The dataset chosen has the following above attributes and it delivers a better result in detecting the quality after testing. The datatypes of the aforementioned attributes are as follows.
  • 10. As described before, there are 1599 observations (rows) for 12 different variables (columns). Quality is type of ‘ordered, categorical, discrete’ variable, whose value ranges from 3-8. A statistical description of the above dataset would provide a more coherent picture as to how the numerical values are distributed across the dataset (Range, Quartiles, Central Tendencies, etc). They are as follows: The overall summary of the dataset covers all the above information, and presents the data in a concise & lucid way. They can be shown as follows: Methods chosen: Univariate Plot Analysis: A univariate plot shows the data and summarizes its distribution. A dot plot, also known as a strip plot, shows the individual observations. A box plot shows the five- number summary of the data – the minimum, first quartile, median, third quartile, and maximum. The graph analysis is as follows :- Here, it can be observed that the density, pH value and wine quality appears to be normally distributed. Fixed, Volatile acidity & Sulphur dioxides, Sulphates and alcohol seems to be long tailed. Qualitatively, residual sugar and chlorides have extreme outliers. Citric acid appeared to have a large number of zero values. This might be a case of non-reporting. Exploratory Data Analysis (EDA) and Data Pre- processingHistograms to show the distribution of the variable values. As we could clearly see, citric acid was one feature that was found to be not normally distributed on a logarithmic scale.
  • 11. Now, a combined variable namely “TAC.acidity” is created that constitutes the sum of Tartaric, acetic & citric acid. It is as follows :- Boxplots for each of the variables as another indicator of spread. Observations regarding variables: All variables have outliers · Acidities like Citric acid, Volatile acidity and Fixed acidity data have critical outliers present. If these outliers are removed, then the distribution of these attributes can become symmetric. · Positively Skewed Distribution is shown by the residual sugar in the wine, interesting fact here is that even if we ignore the outliers, this skewness remains unaffected. · Attributes/variables like Density of wine, Free Sulphur Dioxide have significant outliers, but they are very different from the rest. · Larger side of the data has most of the outliers. · Irregular distribution is shown by the alcohol content of the red wine without any major outliers. Support vector machines are a class of factual models initially created in the mid-1960s by Vladimir Vapnik. In later years, the model has advanced extensively into a standout amongst the most adaptable and powerful machine learning instruments accessible. It is a regulated learning calculation which can be utilized to tackle both characterization and relapse issue, even though the present spotlight is on grouping as it were. To place it, this calculation searches for a straightly distinguishable hyperplane, or a choice limit isolating individual from one class from the other. If such a hyperplane exists, the work is finished! If such a hyperplane does not exist, SVM utilizes a nonlinear mapping to change the preparation information into a higher measurement. At that point it scans for the straight ideal
  • 12. isolating hyperplane. With a fitting nonlinear mapping to an adequately high measurement, information from two classes can simply be isolated by a hyperplane. The SVM calculation discovers this hyperplane utilizing support vectors and edges. As a preparation calculation, SVM may not be quick contrasted with some other grouping techniques, however inferable from its capacity to display complex nonlinear limits, SVM has high precision. SVM is relatively less inclined to overfitting. SVM has effectively been connected to manually written digit acknowledgment, content arrangement, speaker distinguishing proof and so forth. The utilization of this procedure helped us to recognize the correct closer sum and incentive through relapses and definitions. Results and Findings A correlation of each variable has been made against the wine quality to determine those factors which comparatively have a better influence in the quality of wine. It was found that the top 4 variables that influence the wine quality are as follows :- 1) alcohol 2) sulphates (log10) 3) volatile acidity 4) citric acid The following was done to examine the acidity variables. Of all the other factors, base 10 logarithm TAC.acidity correlated very well with Ph, and rightfully so, since pH is a defining measure of acidity. An interesting question to pose, using basic chemistry knowledge, is to ask what other components other than the measured acids are affecting pH.
  • 13. We can quantify this difference by building a predictive linear model, to predict pH based off of TAC.acidity and capture the % difference as a new variable. Conclusion By examining the above information, we could locate the administered learning strategy called bolster vector machine anticipated the essence of the red wine quality and gave us the outcome for more wine quality is specifically corresponding to the liquor content. Although alternate systems were in the same class as this above technique yet it helped us to discover the guess result and we could foresee the quality through the measure of liquor content. The use of this investigation can comprehend whether by adjusting the factors, it is conceivable to build the nature of the wine available. In the event that you can control your factors, at that point you can foresee the nature of your wine and acquire more benefits. As observed, the direct model and the Support Vector Machine. The SVM performed imperceptibly better and we chose to stay with it on the off chance that we needed to make any more expectations. The use of this investigation, can comprehend whether by altering the factors amid wine making, it is conceivable to expand the nature of the wine available. In the event that you can control your factors, at that point you can anticipate the nature of your wine and acquire more benefits. References CVRVV. 2008. Portuguese Wine — Vinho Verde. Comissão de Viticultura da Região dos Vinhos Verdes (CVRVV), http://www.vinhoverde.pt. P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. V. Cherkassy, Y. Ma. 2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural
  • 14. Networks, 17 (1), pp. 113-126 Red Wine Quality. 2018. Kaggle, https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al- 2009/data Appendix A: (R-Script) Red Wine quality assessment =============================================== ========= ```{r echo=FALSE, message=FALSE, warning=FALSE} # install.packages("MASS") # install.packages("gridExtra") # install.packages("grid") # install.packages("ggplot2") # install.packages("lattice") # install.packages("dplyr") # install.packages("memisc") # install.packages("GGally") # install.packages("reshape2") # install.packages("kernlab") # install.packages("plyr") # install.packages("plotly") # install.packages("e1071") require(MASS) require(gridExtra) library(grid) library(ggplot2) library(lattice) require(dplyr) require(memisc) require(GGally) require(reshape2) require(kernlab) update.packages("ggplot2") library(plyr) library(plotly)
  • 15. library(skimr) ``` Read csv file and explore statistics ```{r echo=FALSE,message=FALSE,warning=FALSE} Wine <- read.csv("https://archive.ics.uci.edu/ml/machine- learning-databases/wine-quality/winequality-red.csv", sep = ";") str(Wine) summary(Wine) skim(Wine) Wine$quality <- as.numeric(Wine$quality) ``` Creates tabular results of categorical variables ```{r,message=FALSE,warning=FALSE} table(Wine$quality) ``` # Univariate Plots Section ```{r echo=FALSE,message=FALSE,warning=FALSE} grid.arrange(qplot(Wine$fixed.acidity), qplot(Wine$volatile.acidity), qplot(Wine$citric.acid), qplot(Wine$residual.sugar), qplot(Wine$chlorides), qplot(Wine$free.sulfur.dioxide), qplot(Wine$total.sulfur.dioxide), qplot(Wine$density), qplot(Wine$pH), qplot(Wine$sulphates), qplot(Wine$alcohol), qplot(Wine$quality), ncol = 4) ``` # Univariate Analysis 1. Wine Quality forms a normal distribution. 2. Density and pH are normally distributed with a few outliers. Create new variable for better exploration
  • 16. ```{r,message=FALSE,warning=FALSE} Wine$rating <- ifelse(Wine$quality < 5, 'bad', ifelse( Wine$quality < 7, 'average', 'good')) Wine$rating <- ordered(Wine$rating, levels = c('bad', 'average', 'good')) summary(Wine$rating) ``` Create Histogram of log function of the variables for further analysis ```{r,message=FALSE,warning=FALSE} ggplot(Wine,aes(x=fixed.acidity))+geom_histogram(fill='red')+s cale_x_log10(breaks=4:15)+ xlab('Fixed Acidity')+ylab('Count')+ggtitle('Histogram of Fixed Acidity Values') require(plotly) ggplot() plot_ly(data=Wine,x=~citric.acid,type='histogram') ggplot(Wine) + geom_histogram(aes(x=volatile.acidity),fill='blue')+ scale_x_log10(breaks=seq(0.1,1,0.1)) ggplot(Wine) + geom_histogram(aes(x=citric.acid),fill='green') + scale_x_log10() ``` Citric acid was one feature that was found to be not normally distributed on a logarithmic scale. Create a combined variable, TAC.acidity, containing the sum of tartaric, acetic, and citric acid. ```{r,message=FALSE,warning=FALSE} Wine$TAC.acidity <- Wine$fixed.acidity + Wine$volatile.acidity + Wine$citric.acid qplot(Wine$TAC.acidity,main = 'Histogram of TAC Acidity (fixed+volatile+Citric)') ```
  • 17. ## Boxplots are better suited in visualizing the outliers. ```{r,message=FALSE,warning=FALSE} get_simple_boxplot <- function(column, ylab) { return(qplot(data = Wine, x = 'simple', y = column, geom = 'boxplot', xlab = '', ylab = ylab)) } grid.arrange(get_simple_boxplot(Wine$fixed.acidity, 'fixed acidity'), get_simple_boxplot(Wine$volatile.acidity, 'volatile acidity'), get_simple_boxplot(Wine$citric.acid, 'citric acid'), get_simple_boxplot(Wine$TAC.acidity, 'TAC acidity'), get_simple_boxplot(Wine$residual.sugar, 'residual sugar'), get_simple_boxplot(Wine$chlorides, 'chlorides'), get_simple_boxplot(Wine$free.sulfur.dioxide, 'free sulf. dioxide'), get_simple_boxplot(Wine$total.sulfur.dioxide, 'total sulf. dioxide'), get_simple_boxplot(Wine$density, 'density'), get_simple_boxplot(Wine$pH, 'pH'), get_simple_boxplot(Wine$sulphates, 'sulphates'), get_simple_boxplot(Wine$alcohol, 'alcohol'), ncol = 4) plot_ly(Wine,y=~alcohol,type='box') ``` # Bivariate Plots Section ```{r echo=FALSE,message=FALSE,warning=FALSE} get_bivariate_boxplot <- function(x, y, ylab) { return(qplot(data = Wine, x = x, y = y, geom = 'boxplot', ylab = ylab)) } grid.arrange(get_bivariate_boxplot(Wine$quality, Wine$fixed.acidity,
  • 18. 'fixed acidity'), get_bivariate_boxplot(Wine$quality, Wine$volatile.acidity, 'volatile acidity'), get_bivariate_boxplot(Wine$quality, Wine$citric.acid, 'citric acid'), get_bivariate_boxplot(Wine$quality, Wine$TAC.acidity, 'TAC acidity'), get_bivariate_boxplot(Wine$quality, log10(Wine$residual.sugar), 'residual sugar'), get_bivariate_boxplot(Wine$quality, log10(Wine$chlorides), 'chlorides'), get_bivariate_boxplot(Wine$quality, Wine$free.sulfur.dioxide, 'free sulf. dioxide'), get_bivariate_boxplot(Wine$quality, Wine$total.sulfur.dioxide, 'total sulf. dioxide'), get_bivariate_boxplot(Wine$quality, Wine$density, 'density'), get_bivariate_boxplot(Wine$quality, Wine$pH, 'pH'), get_bivariate_boxplot(Wine$quality, log10(Wine$sulphates), 'sulphates'), get_bivariate_boxplot(Wine$quality, Wine$alcohol, 'alcohol'), ncol = 4) ``` Correlation for each of these variables against quality: ```{r,message=FALSE,warning=FALSE} simple_cor_test <- function(x, y) {
  • 19. return(cor.test(x, as.numeric(y))$estimate) } correlations <- c( simple_cor_test(Wine$fixed.acidity, Wine$quality), simple_cor_test(Wine$volatile.acidity, Wine$quality), simple_cor_test(Wine$citric.acid, Wine$quality), simple_cor_test(Wine$TAC.acidity, Wine$quality), simple_cor_test(log10(Wine$residual.sugar), Wine$quality), simple_cor_test(log10(Wine$chlorides), Wine$quality), simple_cor_test(Wine$free.sulfur.dioxide, Wine$quality), simple_cor_test(Wine$total.sulfur.dioxide, Wine$quality), simple_cor_test(Wine$density, Wine$quality), simple_cor_test(Wine$pH, Wine$quality), simple_cor_test(log10(Wine$sulphates), Wine$quality), simple_cor_test(Wine$alcohol, Wine$quality)) correlations names(correlations) <- c('fixed.acidity', 'volatile.acidity', 'citric.acid', 'TAC.acidity', 'log10.residual.sugar', 'log10.chlordies', 'free.sulfur.dioxide', 'total.sulfur.dioxide', 'density', 'pH', 'log10.sulphates', 'alcohol') correlations ``` Top 4: alcohol sulphates (log10) volatile acidity citric acid Examining the acidity variables: ```{r,message=FALSE,warning=FALSE} ggplot(data = Wine, aes(x = fixed.acidity, y = citric.acid)) + geom_point(alpha=0.3) cor.test(Wine$fixed.acidity, Wine$citric.acid) ggplot(data = Wine, aes(x = volatile.acidity, y = citric.acid)) + geom_point(alpha=0.3)
  • 20. cor.test(Wine$volatile.acidity, Wine$citric.acid) ggplot(data = Wine, aes(x = log10(TAC.acidity), y = pH)) + geom_point(alpha=0.3) cor.test(log10(Wine$TAC.acidity), Wine$pH) ``` Base 10 logarithm TAC.acidity correlated very well with pH. Building a predictive linear model, to predict pH based off of TAC.acidity and capture the % difference as a new variable. ```{r,message=FALSE,warning=FALSE} m <- lm(I(pH) ~ I(log10(TAC.acidity)), data = Wine) Wine$pH.predictions <- predict(m, Wine) # (observed - expected) / expected Wine$pH.error <- (Wine$pH.predictions - Wine$pH)/Wine$pH ``` To check its accuracy. The RMS Error. ```{r,message=FALSE,warning=FALSE} rmse <- function(error) { sqrt(mean(error^2)) } rmse(m$residuals) #Now, we train a Support Vector Machine. require(e1071) SVM <- svm(I(pH) ~ I(log10(TAC.acidity)), data = Wine) Wine$pH.Predict.SVM <- predict(SVM,Wine) Wine$pH.error.SVM <- (Wine$pH.Predict.SVM - Wine$pH)/Wine$pH rmse(SVM$residuals) ``` SVM functions slightly better than a LM. ### Plot 1: Effect of Alcohol on Wine Quality ```{r echo=FALSE,message=FALSE,warning=FALSE} ggplot(data = Wine, aes(x = quality, y = alcohol,
  • 21. fill = rating)) + geom_boxplot(outlier.color = 'red') + ggtitle('Alcohol Levels in Different Wine Qualities') + xlab('Quality') + ylab('Alcohol (% volume)') ``` ### Description 1 These boxplots demonstrate the effect of alcohol content on wine quality. Generally, higher alcohol content correlated with higher wine quality. However, as the outliers and intervals show, alchol content alone did not produce a higher quality. 13