SlideShare a Scribd company logo
1 of 21
Assignment - 03
Model Building, Selection, & Prediction
Question 1:
1. Predicting the Output Variable Y – Energy Production
Prediction
a) Importing the data from CSV data and splitting into test and
training data:
Using the read.csv() function we can import the data into R
INPUT:
OUTPUT:
INPUT:
OUTPUT:
b) Fitting a Linear Regression Model:
Running the Linear Regression Model with all the Variables
INPUT:
OUTPUT:
The Adjusted R-Squared value is found to be 0.2366.
From the data It can seen that Pressure and Wind are only
significant.
So, we run the model only with wind and pressure variables.
Reduced Regression Model (Wind and Pressures Variable only)
INPUT:
OUTPUT:
Removing the Wind Variable since the Adjusted R Squared
Value is only 0.0229. Now we run the regression using only the
Pressure Variable.
Running the Regression model with only Wind Variable:
INPUT:
OUTPUT:
The Adjusted R-Squared value is found to be 0.219, which is
less than the previous regression models.
ANOVA test is to be conducted to find the significance of the
all variable included model and the reduced pressure variable
model.
INPUT:
OUTPUT:
Between the All variable and Reduced model, the P value is
found to be 0.2578, so we should not reject the Null hypothesis
and use the Reduced Model.
Between the Pressure variable and Reduced model, the P value
is found to be 0.0768, so we should not reject the Null
hypothesis and use the Pressure Model.
Running Best Subset to find the model:
Best Subset find the value of statistics for all variables involved
and print the statistics for comparison, using which we can
select the appropriate variable
INPUT:
OUTPUT:
RSS Value decrease as the variable increase.
Model with 5 variable has the highest Adjusted R Square.
Model with 3 variable has the smallest AIC (or Cp).
Model with 8 variable has the smallest BIC.
Since the Bestsubset approach provides a broad result we check
the predicted R square and use the model with highest R square
and lower RMSE
R square and RMSE Prediction:
For all variable considered Model:
INPUT:
OUTPUT:
For the Reduced Model with Pressure and Wind Variables:
INPUT:
OUTPUT:
Single Model with Pressure as the dependent variable:
INPUT:
OUTPUT:
Summary:
From the Analysis we can conclude that model with the pressure
as the dependent variable is better than the other models. The
Adjusted R square value of 0.31 is the best and the RMSE value
is also the least in case of the pressur model.
From the Adjusted R Squared value we conclude that the
pressure model is the best and can predict the energy produced
rate accurately for 31% of the data.
c) Backward Selection Approach:
Regression Model using all the variables:
INPUT:
OUTPUT:
Conclusion:
The backward step AIC function tells a slightly different result
then the models generated above. However, when we create the
regression model we see a low R2 value then our single model.
Below, we can compare all the 3 models above with this step
model.
2
Final Project
ALY-6015 Week 6 Project
Intermediate Analytics
Submitted to:Ani Aghababyan
College of Professional Studies
Northeastern University, MA
Submitted by:
Vikrant Kakad
Vikas Warudkar
Sunita Mohapatra
Darshan Shah
Akshay kannan
Academic Term Spring 2018 - Quarter 2
Introduction
Wine making is affected by a series of variables, when it is
made. Several variables from alcohol, to pH can affect the final
results. It is crucial to understand and learn how these variables
impact the quality of red wine. The scope of this project work is
to understand effect of various attributes which impact the
quality of the Red wine. The data set utilized for the analysis is
downloaded from UCI repository. The analysis has additional
focus on the following key parameters:
pH value - pH value is considered to be a key parameter for the
determination of quality of wine and hence the analysis focused
on determining the impact of these pH values on final quality
determination.
SO2 values (Free and Total) - SO2 has been always a debatable
topic due to the allergic reactions associated with SO2.The
current analysis tries to determine the impact of SO2 on pH
values and the final quality values for the wine samples.
Alcohol content - Alcohol content is an important parameter
considered when a buyer purchases any alcoholic product and
this analysis tries to unravel relationship of Alcohol content
with parameters like pH values and SO2 contents and the impact
to quality.
In this project, we did the analysis of Red Wine Data and try to
understand which variables are responsible for the quality of the
wine. First, we got the feel of the variables on their own and
then we found out the correlation between them and the wine
quality with other factors thrown in. Finally, we created a linear
model to predict the outcome of a test set data.
Proposing supervised learning approach to predict human wine
taste preferences that is based on easily available analytical
tests at the certification step. A large dataset (when compared to
other studies in this domain) is considered, with red Vinho
Verde samples from Portugal (CVRVV, 2008). Two regression
techniques were applied, under a computationally efficient
procedure that performs simultaneous variable and model
selection. The support vector machine achieved promising
results, outperforming the multiple regression and neural
network methods. Such model is useful to support the
oenologist wine tasting evaluations and improve wine
production. Furthermore, similar techniques can help in target
marketing by modeling consumer tastes from niche markets.
Research Question
By performing this analysis, we seek to answer the following
questions:
1. How is the quality of the wines tasted?
2. What is the minimum set of properties and their values that
defines a high-quality wine?
3. What are considered wine defects?
About dataset
· Name: Red Wine Quality Data Set
· Sources Created by: Paulo Cortez (Univ. Minho), Antonio
Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis
(CVRVV, 2009)
· Input variables:
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
· Output variable: quality (score between 0 and 10)
· Data Set Characteristics: Multivariate
· Number of Observations: 1599
· Number of Attributes: 12
· Missing Values: N/A
Description of attributes:
1. Fixed acidity: Most acids involved with wine or fixed or
nonvolatile (do not evaporate readily)
2. Volatile acidity: The amount of acetic acid in wine, which at
too high of levels can lead to an unpleasant, vinegar taste
3. Citric acid: Found in small quantities, citric acid can add
'freshness' and flavor to wines
4. Residual sugar: The amount of sugar remaining after
fermentation stops, it's rare to find wines with less than 1
gram/liter and wines with greater than 45 grams/liter are
considered sweet
5. Chlorides: The amount of salt in the wine
6. Free sulfur dioxide: The free form of SO2 exists in
equilibrium between molecular SO2 (as a dissolved gas) and
bisulfite ion; it prevents microbial growth and the oxidation of
wine
7. Total sulfur dioxide: Amount of free and bound forms of S02;
in low concentrations, SO2 is mostly undetectable in wine, but
at free SO2 concentrations over 50 ppm, SO2 becomes evident
in the nose and taste of wine
8. Density: The density of water is close to that of water
depending on the percent alcohol and sugar content
9. pH: Describes how acidic or basic a wine is on a scale from 0
(very acidic) to 14 (very basic); most wines are between 3-4 on
the pH scale
10. Sulphates: A wine additive which can contribute to sulfur
dioxide gas (S02) levels, which acts as an antimicrobial and
antioxidant
11. Alcohol: the percent alcohol content of the wine
12. Quality: output variable (based on sensory data, score
between 0 and 10)
The dataset chosen has the following above attributes and it
delivers a better result in detecting the quality after testing. The
datatypes of the aforementioned attributes are as follows.
As described before, there are 1599 observations (rows) for 12
different variables (columns). Quality is type of ‘ordered,
categorical, discrete’ variable, whose value ranges from 3-8.
A statistical description of the above dataset would provide a
more coherent picture as to how the numerical values are
distributed across the dataset (Range, Quartiles, Central
Tendencies, etc). They are as follows:
The overall summary of the dataset covers all the above
information, and presents the data in a concise & lucid way.
They can be shown as follows:
Methods chosen:
Univariate Plot Analysis:
A univariate plot shows the data and summarizes its
distribution. A dot plot, also known as a strip plot, shows the
individual observations. A box plot shows the five-
number summary of the data – the minimum, first quartile,
median, third quartile, and maximum.
The graph analysis is as follows :-
Here, it can be observed that the density, pH value and wine
quality appears to be normally distributed. Fixed, Volatile
acidity & Sulphur dioxides, Sulphates and alcohol seems to be
long tailed. Qualitatively, residual sugar and chlorides have
extreme outliers. Citric acid appeared to have a large number of
zero values. This might be a case of non-reporting.
Exploratory Data Analysis (EDA) and Data Pre-
processingHistograms to show the distribution of the variable
values. As we could clearly see, citric acid was one feature that
was found to be not normally distributed on a logarithmic scale.
Now, a combined variable namely “TAC.acidity” is created
that constitutes the sum of Tartaric, acetic & citric acid. It is as
follows :-
Boxplots for each of the variables as another indicator of
spread.
Observations regarding variables: All variables have
outliers
· Acidities like Citric acid, Volatile acidity and Fixed acidity
data have critical outliers present. If these outliers are removed,
then the distribution of these attributes can become symmetric.
· Positively Skewed Distribution is shown by the residual sugar
in the wine, interesting fact here is that even if we ignore the
outliers, this skewness remains unaffected.
· Attributes/variables like Density of wine, Free Sulphur
Dioxide have significant outliers, but they are very different
from the rest.
· Larger side of the data has most of the outliers.
· Irregular distribution is shown by the alcohol content of the
red wine without any major outliers.
Support vector machines are a class of factual models initially
created in the mid-1960s by Vladimir Vapnik. In later years, the
model has advanced extensively into a standout amongst the
most adaptable and powerful machine learning instruments
accessible. It is a regulated learning calculation which can be
utilized to tackle both characterization and relapse issue, even
though the present spotlight is on grouping as it were. To place
it, this calculation searches for a straightly distinguishable
hyperplane, or a choice limit isolating individual from one class
from the other. If such a hyperplane exists, the work is finished!
If such a hyperplane does not exist, SVM utilizes a nonlinear
mapping to change the preparation information into a higher
measurement. At that point it scans for the straight ideal
isolating hyperplane. With a fitting nonlinear mapping to an
adequately high measurement, information from two classes can
simply be isolated by a hyperplane. The SVM calculation
discovers this hyperplane utilizing support vectors and edges.
As a preparation calculation, SVM may not be quick contrasted
with some other grouping techniques, however inferable from
its capacity to display complex nonlinear limits, SVM has high
precision. SVM is relatively less inclined to overfitting. SVM
has effectively been connected to manually written digit
acknowledgment, content arrangement, speaker distinguishing
proof and so forth. The utilization of this procedure helped us to
recognize the correct closer sum and incentive through relapses
and definitions.
Results and Findings
A correlation of each variable has been made against the wine
quality to determine those factors which comparatively have a
better influence in the quality of wine. It was found that the top
4 variables that influence the wine quality are as follows :-
1) alcohol
2) sulphates (log10)
3) volatile acidity
4) citric acid
The following was done to examine the acidity variables.
Of all the other factors, base 10 logarithm TAC.acidity
correlated very well with Ph, and rightfully so, since pH is a
defining measure of acidity.
An interesting question to pose, using basic chemistry
knowledge, is to ask what other components other than the
measured acids are affecting pH.
We can quantify this difference by building a predictive linear
model, to predict pH based off of TAC.acidity and capture the
% difference as a new variable.
Conclusion
By examining the above information, we could locate the
administered learning strategy called bolster vector machine
anticipated the essence of the red wine quality and gave us the
outcome for more wine quality is specifically corresponding to
the liquor content. Although alternate systems were in the same
class as this above technique yet it helped us to discover the
guess result and we could foresee the quality through the
measure of liquor content. The use of this investigation can
comprehend whether by adjusting the factors, it is conceivable
to build the nature of the wine available. In the event that you
can control your factors, at that point you can foresee the nature
of your wine and acquire more benefits.
As observed, the direct model and the Support Vector Machine.
The SVM performed imperceptibly better and we chose to stay
with it on the off chance that we needed to make any more
expectations. The use of this investigation, can comprehend
whether by altering the factors amid wine making, it is
conceivable to expand the nature of the wine available. In the
event that you can control your factors, at that point you can
anticipate the nature of your wine and acquire more benefits.
References
CVRVV. 2008. Portuguese Wine — Vinho Verde. Comissão de
Viticultura da Região dos Vinhos Verdes (CVRVV),
http://www.vinhoverde.pt.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 2009.
Modeling wine preferences by data mining from
physicochemical properties. In Decision Support Systems,
Elsevier, 47(4):547-553.
V. Cherkassy, Y. Ma. 2004. Practical selection of SVM
parameters and noise estimation for SVM regression. Neural
Networks, 17 (1), pp. 113-126
Red Wine Quality. 2018. Kaggle,
https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-
2009/data
Appendix A: (R-Script)
Red Wine quality assessment
===============================================
=========
```{r echo=FALSE, message=FALSE, warning=FALSE}
# install.packages("MASS")
# install.packages("gridExtra")
# install.packages("grid")
# install.packages("ggplot2")
# install.packages("lattice")
# install.packages("dplyr")
# install.packages("memisc")
# install.packages("GGally")
# install.packages("reshape2")
# install.packages("kernlab")
# install.packages("plyr")
# install.packages("plotly")
# install.packages("e1071")
require(MASS)
require(gridExtra)
library(grid)
library(ggplot2)
library(lattice)
require(dplyr)
require(memisc)
require(GGally)
require(reshape2)
require(kernlab)
update.packages("ggplot2")
library(plyr)
library(plotly)
library(skimr)
```
Read csv file and explore statistics
```{r echo=FALSE,message=FALSE,warning=FALSE}
Wine <- read.csv("https://archive.ics.uci.edu/ml/machine-
learning-databases/wine-quality/winequality-red.csv", sep = ";")
str(Wine)
summary(Wine)
skim(Wine)
Wine$quality <- as.numeric(Wine$quality)
```
Creates tabular results of categorical variables
```{r,message=FALSE,warning=FALSE}
table(Wine$quality)
```
# Univariate Plots Section
```{r echo=FALSE,message=FALSE,warning=FALSE}
grid.arrange(qplot(Wine$fixed.acidity),
qplot(Wine$volatile.acidity),
qplot(Wine$citric.acid),
qplot(Wine$residual.sugar),
qplot(Wine$chlorides),
qplot(Wine$free.sulfur.dioxide),
qplot(Wine$total.sulfur.dioxide),
qplot(Wine$density),
qplot(Wine$pH),
qplot(Wine$sulphates),
qplot(Wine$alcohol),
qplot(Wine$quality),
ncol = 4)
```
# Univariate Analysis
1. Wine Quality forms a normal distribution.
2. Density and pH are normally distributed with a few outliers.
Create new variable for better exploration
```{r,message=FALSE,warning=FALSE}
Wine$rating <- ifelse(Wine$quality < 5, 'bad', ifelse(
Wine$quality < 7, 'average', 'good'))
Wine$rating <- ordered(Wine$rating,
levels = c('bad', 'average', 'good'))
summary(Wine$rating)
```
Create Histogram of log function of the variables for further
analysis
```{r,message=FALSE,warning=FALSE}
ggplot(Wine,aes(x=fixed.acidity))+geom_histogram(fill='red')+s
cale_x_log10(breaks=4:15)+
xlab('Fixed Acidity')+ylab('Count')+ggtitle('Histogram of
Fixed Acidity Values')
require(plotly)
ggplot()
plot_ly(data=Wine,x=~citric.acid,type='histogram')
ggplot(Wine) +
geom_histogram(aes(x=volatile.acidity),fill='blue')+
scale_x_log10(breaks=seq(0.1,1,0.1))
ggplot(Wine) +
geom_histogram(aes(x=citric.acid),fill='green') +
scale_x_log10()
```
Citric acid was one feature that was found to be not
normally distributed on a logarithmic scale.
Create a combined variable,
TAC.acidity, containing the sum of tartaric, acetic, and citric
acid.
```{r,message=FALSE,warning=FALSE}
Wine$TAC.acidity <- Wine$fixed.acidity +
Wine$volatile.acidity +
Wine$citric.acid
qplot(Wine$TAC.acidity,main = 'Histogram of TAC Acidity
(fixed+volatile+Citric)')
```
## Boxplots are better suited in visualizing the outliers.
```{r,message=FALSE,warning=FALSE}
get_simple_boxplot <- function(column, ylab) {
return(qplot(data = Wine, x = 'simple',
y = column, geom = 'boxplot',
xlab = '',
ylab = ylab))
}
grid.arrange(get_simple_boxplot(Wine$fixed.acidity, 'fixed
acidity'),
get_simple_boxplot(Wine$volatile.acidity, 'volatile
acidity'),
get_simple_boxplot(Wine$citric.acid, 'citric acid'),
get_simple_boxplot(Wine$TAC.acidity, 'TAC acidity'),
get_simple_boxplot(Wine$residual.sugar, 'residual
sugar'),
get_simple_boxplot(Wine$chlorides, 'chlorides'),
get_simple_boxplot(Wine$free.sulfur.dioxide, 'free
sulf. dioxide'),
get_simple_boxplot(Wine$total.sulfur.dioxide, 'total
sulf. dioxide'),
get_simple_boxplot(Wine$density, 'density'),
get_simple_boxplot(Wine$pH, 'pH'),
get_simple_boxplot(Wine$sulphates, 'sulphates'),
get_simple_boxplot(Wine$alcohol, 'alcohol'),
ncol = 4)
plot_ly(Wine,y=~alcohol,type='box')
```
# Bivariate Plots Section
```{r echo=FALSE,message=FALSE,warning=FALSE}
get_bivariate_boxplot <- function(x, y, ylab) {
return(qplot(data = Wine, x = x, y = y, geom = 'boxplot', ylab
= ylab))
}
grid.arrange(get_bivariate_boxplot(Wine$quality,
Wine$fixed.acidity,
'fixed acidity'),
get_bivariate_boxplot(Wine$quality,
Wine$volatile.acidity,
'volatile acidity'),
get_bivariate_boxplot(Wine$quality, Wine$citric.acid,
'citric acid'),
get_bivariate_boxplot(Wine$quality,
Wine$TAC.acidity,
'TAC acidity'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$residual.sugar),
'residual sugar'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$chlorides),
'chlorides'),
get_bivariate_boxplot(Wine$quality,
Wine$free.sulfur.dioxide,
'free sulf. dioxide'),
get_bivariate_boxplot(Wine$quality,
Wine$total.sulfur.dioxide,
'total sulf. dioxide'),
get_bivariate_boxplot(Wine$quality, Wine$density,
'density'),
get_bivariate_boxplot(Wine$quality, Wine$pH,
'pH'),
get_bivariate_boxplot(Wine$quality,
log10(Wine$sulphates),
'sulphates'),
get_bivariate_boxplot(Wine$quality, Wine$alcohol,
'alcohol'),
ncol = 4)
```
Correlation for each of these
variables against quality:
```{r,message=FALSE,warning=FALSE}
simple_cor_test <- function(x, y) {
return(cor.test(x, as.numeric(y))$estimate)
}
correlations <- c(
simple_cor_test(Wine$fixed.acidity, Wine$quality),
simple_cor_test(Wine$volatile.acidity, Wine$quality),
simple_cor_test(Wine$citric.acid, Wine$quality),
simple_cor_test(Wine$TAC.acidity, Wine$quality),
simple_cor_test(log10(Wine$residual.sugar), Wine$quality),
simple_cor_test(log10(Wine$chlorides), Wine$quality),
simple_cor_test(Wine$free.sulfur.dioxide, Wine$quality),
simple_cor_test(Wine$total.sulfur.dioxide, Wine$quality),
simple_cor_test(Wine$density, Wine$quality),
simple_cor_test(Wine$pH, Wine$quality),
simple_cor_test(log10(Wine$sulphates), Wine$quality),
simple_cor_test(Wine$alcohol, Wine$quality))
correlations
names(correlations) <- c('fixed.acidity', 'volatile.acidity',
'citric.acid',
'TAC.acidity', 'log10.residual.sugar',
'log10.chlordies', 'free.sulfur.dioxide',
'total.sulfur.dioxide', 'density', 'pH',
'log10.sulphates', 'alcohol')
correlations
```
Top 4:
alcohol
sulphates (log10)
volatile acidity
citric acid
Examining the acidity variables:
```{r,message=FALSE,warning=FALSE}
ggplot(data = Wine, aes(x = fixed.acidity, y = citric.acid)) +
geom_point(alpha=0.3)
cor.test(Wine$fixed.acidity, Wine$citric.acid)
ggplot(data = Wine, aes(x = volatile.acidity, y = citric.acid)) +
geom_point(alpha=0.3)
cor.test(Wine$volatile.acidity, Wine$citric.acid)
ggplot(data = Wine, aes(x = log10(TAC.acidity), y = pH)) +
geom_point(alpha=0.3)
cor.test(log10(Wine$TAC.acidity), Wine$pH)
```
Base 10 logarithm TAC.acidity correlated very well with pH.
Building a predictive linear model,
to predict pH based off of TAC.acidity and
capture the % difference as a new variable.
```{r,message=FALSE,warning=FALSE}
m <- lm(I(pH) ~ I(log10(TAC.acidity)), data = Wine)
Wine$pH.predictions <- predict(m, Wine)
# (observed - expected) / expected
Wine$pH.error <- (Wine$pH.predictions - Wine$pH)/Wine$pH
```
To check its accuracy.
The RMS Error.
```{r,message=FALSE,warning=FALSE}
rmse <- function(error)
{
sqrt(mean(error^2))
}
rmse(m$residuals)
#Now, we train a Support Vector Machine.
require(e1071)
SVM <- svm(I(pH) ~ I(log10(TAC.acidity)), data = Wine)
Wine$pH.Predict.SVM <- predict(SVM,Wine)
Wine$pH.error.SVM <- (Wine$pH.Predict.SVM -
Wine$pH)/Wine$pH
rmse(SVM$residuals)
```
SVM functions slightly better than a LM.
### Plot 1: Effect of Alcohol on Wine Quality
```{r echo=FALSE,message=FALSE,warning=FALSE}
ggplot(data = Wine, aes(x = quality, y = alcohol,
fill = rating)) +
geom_boxplot(outlier.color = 'red') +
ggtitle('Alcohol Levels in Different Wine Qualities') +
xlab('Quality') +
ylab('Alcohol (% volume)')
```
### Description 1
These boxplots demonstrate the effect of alcohol content on
wine quality.
Generally, higher alcohol content correlated with higher wine
quality.
However, as the outliers and intervals show, alchol content
alone did not
produce a higher quality.
13

More Related Content

Similar to Assignment - 03Model Building, Selection, & Prediction.docx

Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Shimadzu Scientific Instruments
 
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Shimadzu Scientific Instruments
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00Vijay Dhonde
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsABHISHEKDAHALE
 
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMRPANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMRJohn Edwards
 
Qualification of instrumets
Qualification of instrumetsQualification of instrumets
Qualification of instrumetsChowdaryPavani
 
Air pollution meter
Air pollution meterAir pollution meter
Air pollution meterAyuenNyiel
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 ASAnkur Sansanwal
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdfEmerson Ceras
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
 
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...PECB
 
Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)PRAVADA
 

Similar to Assignment - 03Model Building, Selection, & Prediction.docx (20)

Blood Gas Analyzer
Blood Gas AnalyzerBlood Gas Analyzer
Blood Gas Analyzer
 
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
Analysis of Trace Elements in Water by EPA Method 200.8 using ICP Mass Spectr...
 
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
Determination of Ethanol and Isopropanol Content in Hand Sanitizers Using Nit...
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00
 
Wine.Final.Project.MJv3
Wine.Final.Project.MJv3Wine.Final.Project.MJv3
Wine.Final.Project.MJv3
 
Tools of the Trade
Tools of the TradeTools of the Trade
Tools of the Trade
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMRPANIC 2014 Poster  PNA-MNova - Aloe - Beer - Small Mixture qNMR
PANIC 2014 Poster PNA-MNova - Aloe - Beer - Small Mixture qNMR
 
Qualification of instrumets
Qualification of instrumetsQualification of instrumets
Qualification of instrumets
 
Air pollution meter
Air pollution meterAir pollution meter
Air pollution meter
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 AS
 
QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdf
 
GRPE-76-18e.pptx
GRPE-76-18e.pptxGRPE-76-18e.pptx
GRPE-76-18e.pptx
 
pdf.pdf
pdf.pdfpdf.pdf
pdf.pdf
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
Estimation of Measurement Uncertainty in Labs: a requirement for ISO 17025 Ac...
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
Group5
Group5Group5
Group5
 
Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)Qualification of High Performance Liquid Chromatography(HPLC)
Qualification of High Performance Liquid Chromatography(HPLC)
 
Artificial Nose-Presentation.pdf
Artificial Nose-Presentation.pdfArtificial Nose-Presentation.pdf
Artificial Nose-Presentation.pdf
 

More from jane3dyson92312

Assignment (2- to 3-page case study analysis)Scenario 6.docx
Assignment (2- to 3-page case study analysis)Scenario 6.docxAssignment (2- to 3-page case study analysis)Scenario 6.docx
Assignment (2- to 3-page case study analysis)Scenario 6.docxjane3dyson92312
 
Assignment (2–4 pages, excluding Title Page and Reference.docx
Assignment (2–4 pages, excluding Title Page and Reference.docxAssignment (2–4 pages, excluding Title Page and Reference.docx
Assignment (2–4 pages, excluding Title Page and Reference.docxjane3dyson92312
 
Assignment (2–4 pages, APA format) Your paper should include.docx
Assignment (2–4 pages, APA format) Your paper should include.docxAssignment (2–4 pages, APA format) Your paper should include.docx
Assignment (2–4 pages, APA format) Your paper should include.docxjane3dyson92312
 
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docx
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docxASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docx
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docxjane3dyson92312
 
Assignment #5 Community Based Organization Profile Due.docx
Assignment #5 Community Based Organization Profile Due.docxAssignment #5 Community Based Organization Profile Due.docx
Assignment #5 Community Based Organization Profile Due.docxjane3dyson92312
 
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docx
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docxAssignment #5 - Philosophy Figure essayInstructionsSelect a.docx
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docxjane3dyson92312
 
Assignment #5 - Philosophy Figure essayInstructionsSele.docx
Assignment #5 - Philosophy Figure essayInstructionsSele.docxAssignment #5 - Philosophy Figure essayInstructionsSele.docx
Assignment #5 - Philosophy Figure essayInstructionsSele.docxjane3dyson92312
 
Assignment #5 100 points ________________________.docx
Assignment #5            100 points ________________________.docxAssignment #5            100 points ________________________.docx
Assignment #5 100 points ________________________.docxjane3dyson92312
 
Assignment #4 Parent Communication PaperIt is common for a .docx
Assignment #4 Parent Communication PaperIt is common for a .docxAssignment #4 Parent Communication PaperIt is common for a .docx
Assignment #4 Parent Communication PaperIt is common for a .docxjane3dyson92312
 
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docx
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docxAssignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docx
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docxjane3dyson92312
 
Assignment #3 Grading RubricNameHighly CompetentComp.docx
Assignment #3 Grading RubricNameHighly CompetentComp.docxAssignment #3 Grading RubricNameHighly CompetentComp.docx
Assignment #3 Grading RubricNameHighly CompetentComp.docxjane3dyson92312
 
Assignment #2Instructional Design Prospectusby .docx
Assignment #2Instructional Design Prospectusby .docxAssignment #2Instructional Design Prospectusby .docx
Assignment #2Instructional Design Prospectusby .docxjane3dyson92312
 
Assignment #2 Write an evaluation of a campus event focused on .docx
Assignment #2 Write an evaluation of a campus event focused on .docxAssignment #2 Write an evaluation of a campus event focused on .docx
Assignment #2 Write an evaluation of a campus event focused on .docxjane3dyson92312
 
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docx
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docxAssignment #2  Write a 1-2 page paper. Deliverable length does not .docx
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docxjane3dyson92312
 
Assignment #2 Internet Field Trip1. Research Research at least s.docx
Assignment #2 Internet Field Trip1. Research Research at least s.docxAssignment #2 Internet Field Trip1. Research Research at least s.docx
Assignment #2 Internet Field Trip1. Research Research at least s.docxjane3dyson92312
 
Assignment #2 Internet Field TripResearch Research at least six .docx
Assignment #2 Internet Field TripResearch Research at least six .docxAssignment #2 Internet Field TripResearch Research at least six .docx
Assignment #2 Internet Field TripResearch Research at least six .docxjane3dyson92312
 
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docx
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docxAssignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docx
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docxjane3dyson92312
 
Assignment #2 Internet Field Trip 1.Research Research at lea.docx
Assignment #2 Internet Field Trip 1.Research Research at lea.docxAssignment #2 Internet Field Trip 1.Research Research at lea.docx
Assignment #2 Internet Field Trip 1.Research Research at lea.docxjane3dyson92312
 
Assignment #2 Assignment Due Date 6219 by .docx
Assignment #2     Assignment Due Date  6219 by .docxAssignment #2     Assignment Due Date  6219 by .docx
Assignment #2 Assignment Due Date 6219 by .docxjane3dyson92312
 
ASSIGNMENT #2 WRITING NOTES IN EVERNOTE G.docx
ASSIGNMENT #2         WRITING NOTES IN EVERNOTE              G.docxASSIGNMENT #2         WRITING NOTES IN EVERNOTE              G.docx
ASSIGNMENT #2 WRITING NOTES IN EVERNOTE G.docxjane3dyson92312
 

More from jane3dyson92312 (20)

Assignment (2- to 3-page case study analysis)Scenario 6.docx
Assignment (2- to 3-page case study analysis)Scenario 6.docxAssignment (2- to 3-page case study analysis)Scenario 6.docx
Assignment (2- to 3-page case study analysis)Scenario 6.docx
 
Assignment (2–4 pages, excluding Title Page and Reference.docx
Assignment (2–4 pages, excluding Title Page and Reference.docxAssignment (2–4 pages, excluding Title Page and Reference.docx
Assignment (2–4 pages, excluding Title Page and Reference.docx
 
Assignment (2–4 pages, APA format) Your paper should include.docx
Assignment (2–4 pages, APA format) Your paper should include.docxAssignment (2–4 pages, APA format) Your paper should include.docx
Assignment (2–4 pages, APA format) Your paper should include.docx
 
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docx
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docxASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docx
ASSIGNMENT #6POLS 365IDENTIFYING VARIABLES AND PROPOSING HYP.docx
 
Assignment #5 Community Based Organization Profile Due.docx
Assignment #5 Community Based Organization Profile Due.docxAssignment #5 Community Based Organization Profile Due.docx
Assignment #5 Community Based Organization Profile Due.docx
 
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docx
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docxAssignment #5 - Philosophy Figure essayInstructionsSelect a.docx
Assignment #5 - Philosophy Figure essayInstructionsSelect a.docx
 
Assignment #5 - Philosophy Figure essayInstructionsSele.docx
Assignment #5 - Philosophy Figure essayInstructionsSele.docxAssignment #5 - Philosophy Figure essayInstructionsSele.docx
Assignment #5 - Philosophy Figure essayInstructionsSele.docx
 
Assignment #5 100 points ________________________.docx
Assignment #5            100 points ________________________.docxAssignment #5            100 points ________________________.docx
Assignment #5 100 points ________________________.docx
 
Assignment #4 Parent Communication PaperIt is common for a .docx
Assignment #4 Parent Communication PaperIt is common for a .docxAssignment #4 Parent Communication PaperIt is common for a .docx
Assignment #4 Parent Communication PaperIt is common for a .docx
 
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docx
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docxAssignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docx
Assignment #4 OD Application Why Teams Are 14 Time Zones Apart” (.docx
 
Assignment #3 Grading RubricNameHighly CompetentComp.docx
Assignment #3 Grading RubricNameHighly CompetentComp.docxAssignment #3 Grading RubricNameHighly CompetentComp.docx
Assignment #3 Grading RubricNameHighly CompetentComp.docx
 
Assignment #2Instructional Design Prospectusby .docx
Assignment #2Instructional Design Prospectusby .docxAssignment #2Instructional Design Prospectusby .docx
Assignment #2Instructional Design Prospectusby .docx
 
Assignment #2 Write an evaluation of a campus event focused on .docx
Assignment #2 Write an evaluation of a campus event focused on .docxAssignment #2 Write an evaluation of a campus event focused on .docx
Assignment #2 Write an evaluation of a campus event focused on .docx
 
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docx
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docxAssignment #2  Write a 1-2 page paper. Deliverable length does not .docx
Assignment #2  Write a 1-2 page paper. Deliverable length does not .docx
 
Assignment #2 Internet Field Trip1. Research Research at least s.docx
Assignment #2 Internet Field Trip1. Research Research at least s.docxAssignment #2 Internet Field Trip1. Research Research at least s.docx
Assignment #2 Internet Field Trip1. Research Research at least s.docx
 
Assignment #2 Internet Field TripResearch Research at least six .docx
Assignment #2 Internet Field TripResearch Research at least six .docxAssignment #2 Internet Field TripResearch Research at least six .docx
Assignment #2 Internet Field TripResearch Research at least six .docx
 
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docx
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docxAssignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docx
Assignment #2 MUS 1030-003012 Instructor Dr. EunHye Grace Choi.docx
 
Assignment #2 Internet Field Trip 1.Research Research at lea.docx
Assignment #2 Internet Field Trip 1.Research Research at lea.docxAssignment #2 Internet Field Trip 1.Research Research at lea.docx
Assignment #2 Internet Field Trip 1.Research Research at lea.docx
 
Assignment #2 Assignment Due Date 6219 by .docx
Assignment #2     Assignment Due Date  6219 by .docxAssignment #2     Assignment Due Date  6219 by .docx
Assignment #2 Assignment Due Date 6219 by .docx
 
ASSIGNMENT #2 WRITING NOTES IN EVERNOTE G.docx
ASSIGNMENT #2         WRITING NOTES IN EVERNOTE              G.docxASSIGNMENT #2         WRITING NOTES IN EVERNOTE              G.docx
ASSIGNMENT #2 WRITING NOTES IN EVERNOTE G.docx
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Assignment - 03Model Building, Selection, & Prediction.docx

  • 1. Assignment - 03 Model Building, Selection, & Prediction Question 1: 1. Predicting the Output Variable Y – Energy Production Prediction a) Importing the data from CSV data and splitting into test and training data: Using the read.csv() function we can import the data into R INPUT:
  • 2. OUTPUT: INPUT: OUTPUT: b) Fitting a Linear Regression Model: Running the Linear Regression Model with all the Variables INPUT: OUTPUT: The Adjusted R-Squared value is found to be 0.2366. From the data It can seen that Pressure and Wind are only significant. So, we run the model only with wind and pressure variables. Reduced Regression Model (Wind and Pressures Variable only) INPUT:
  • 3. OUTPUT: Removing the Wind Variable since the Adjusted R Squared Value is only 0.0229. Now we run the regression using only the Pressure Variable. Running the Regression model with only Wind Variable: INPUT: OUTPUT: The Adjusted R-Squared value is found to be 0.219, which is less than the previous regression models. ANOVA test is to be conducted to find the significance of the all variable included model and the reduced pressure variable model. INPUT: OUTPUT: Between the All variable and Reduced model, the P value is found to be 0.2578, so we should not reject the Null hypothesis and use the Reduced Model. Between the Pressure variable and Reduced model, the P value is found to be 0.0768, so we should not reject the Null
  • 4. hypothesis and use the Pressure Model. Running Best Subset to find the model: Best Subset find the value of statistics for all variables involved and print the statistics for comparison, using which we can select the appropriate variable INPUT: OUTPUT: RSS Value decrease as the variable increase. Model with 5 variable has the highest Adjusted R Square. Model with 3 variable has the smallest AIC (or Cp). Model with 8 variable has the smallest BIC. Since the Bestsubset approach provides a broad result we check the predicted R square and use the model with highest R square and lower RMSE R square and RMSE Prediction: For all variable considered Model: INPUT: OUTPUT: For the Reduced Model with Pressure and Wind Variables: INPUT:
  • 5. OUTPUT: Single Model with Pressure as the dependent variable: INPUT: OUTPUT: Summary: From the Analysis we can conclude that model with the pressure as the dependent variable is better than the other models. The Adjusted R square value of 0.31 is the best and the RMSE value is also the least in case of the pressur model. From the Adjusted R Squared value we conclude that the pressure model is the best and can predict the energy produced rate accurately for 31% of the data. c) Backward Selection Approach: Regression Model using all the variables: INPUT: OUTPUT:
  • 6. Conclusion: The backward step AIC function tells a slightly different result then the models generated above. However, when we create the regression model we see a low R2 value then our single model. Below, we can compare all the 3 models above with this step model. 2 Final Project ALY-6015 Week 6 Project Intermediate Analytics Submitted to:Ani Aghababyan College of Professional Studies Northeastern University, MA Submitted by: Vikrant Kakad Vikas Warudkar Sunita Mohapatra Darshan Shah Akshay kannan Academic Term Spring 2018 - Quarter 2 Introduction Wine making is affected by a series of variables, when it is made. Several variables from alcohol, to pH can affect the final
  • 7. results. It is crucial to understand and learn how these variables impact the quality of red wine. The scope of this project work is to understand effect of various attributes which impact the quality of the Red wine. The data set utilized for the analysis is downloaded from UCI repository. The analysis has additional focus on the following key parameters: pH value - pH value is considered to be a key parameter for the determination of quality of wine and hence the analysis focused on determining the impact of these pH values on final quality determination. SO2 values (Free and Total) - SO2 has been always a debatable topic due to the allergic reactions associated with SO2.The current analysis tries to determine the impact of SO2 on pH values and the final quality values for the wine samples. Alcohol content - Alcohol content is an important parameter considered when a buyer purchases any alcoholic product and this analysis tries to unravel relationship of Alcohol content with parameters like pH values and SO2 contents and the impact to quality. In this project, we did the analysis of Red Wine Data and try to understand which variables are responsible for the quality of the wine. First, we got the feel of the variables on their own and then we found out the correlation between them and the wine quality with other factors thrown in. Finally, we created a linear model to predict the outcome of a test set data. Proposing supervised learning approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A large dataset (when compared to other studies in this domain) is considered, with red Vinho Verde samples from Portugal (CVRVV, 2008). Two regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful to support the oenologist wine tasting evaluations and improve wine
  • 8. production. Furthermore, similar techniques can help in target marketing by modeling consumer tastes from niche markets. Research Question By performing this analysis, we seek to answer the following questions: 1. How is the quality of the wines tasted? 2. What is the minimum set of properties and their values that defines a high-quality wine? 3. What are considered wine defects? About dataset · Name: Red Wine Quality Data Set · Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV, 2009) · Input variables: 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol · Output variable: quality (score between 0 and 10) · Data Set Characteristics: Multivariate · Number of Observations: 1599 · Number of Attributes: 12 · Missing Values: N/A
  • 9. Description of attributes: 1. Fixed acidity: Most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. Volatile acidity: The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. Citric acid: Found in small quantities, citric acid can add 'freshness' and flavor to wines 4. Residual sugar: The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. Chlorides: The amount of salt in the wine 6. Free sulfur dioxide: The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. Total sulfur dioxide: Amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. Density: The density of water is close to that of water depending on the percent alcohol and sugar content 9. pH: Describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. Sulphates: A wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant 11. Alcohol: the percent alcohol content of the wine 12. Quality: output variable (based on sensory data, score between 0 and 10) The dataset chosen has the following above attributes and it delivers a better result in detecting the quality after testing. The datatypes of the aforementioned attributes are as follows.
  • 10. As described before, there are 1599 observations (rows) for 12 different variables (columns). Quality is type of ‘ordered, categorical, discrete’ variable, whose value ranges from 3-8. A statistical description of the above dataset would provide a more coherent picture as to how the numerical values are distributed across the dataset (Range, Quartiles, Central Tendencies, etc). They are as follows: The overall summary of the dataset covers all the above information, and presents the data in a concise & lucid way. They can be shown as follows: Methods chosen: Univariate Plot Analysis: A univariate plot shows the data and summarizes its distribution. A dot plot, also known as a strip plot, shows the individual observations. A box plot shows the five- number summary of the data – the minimum, first quartile, median, third quartile, and maximum. The graph analysis is as follows :- Here, it can be observed that the density, pH value and wine quality appears to be normally distributed. Fixed, Volatile acidity & Sulphur dioxides, Sulphates and alcohol seems to be long tailed. Qualitatively, residual sugar and chlorides have extreme outliers. Citric acid appeared to have a large number of zero values. This might be a case of non-reporting. Exploratory Data Analysis (EDA) and Data Pre- processingHistograms to show the distribution of the variable values. As we could clearly see, citric acid was one feature that was found to be not normally distributed on a logarithmic scale.
  • 11. Now, a combined variable namely “TAC.acidity” is created that constitutes the sum of Tartaric, acetic & citric acid. It is as follows :- Boxplots for each of the variables as another indicator of spread. Observations regarding variables: All variables have outliers · Acidities like Citric acid, Volatile acidity and Fixed acidity data have critical outliers present. If these outliers are removed, then the distribution of these attributes can become symmetric. · Positively Skewed Distribution is shown by the residual sugar in the wine, interesting fact here is that even if we ignore the outliers, this skewness remains unaffected. · Attributes/variables like Density of wine, Free Sulphur Dioxide have significant outliers, but they are very different from the rest. · Larger side of the data has most of the outliers. · Irregular distribution is shown by the alcohol content of the red wine without any major outliers. Support vector machines are a class of factual models initially created in the mid-1960s by Vladimir Vapnik. In later years, the model has advanced extensively into a standout amongst the most adaptable and powerful machine learning instruments accessible. It is a regulated learning calculation which can be utilized to tackle both characterization and relapse issue, even though the present spotlight is on grouping as it were. To place it, this calculation searches for a straightly distinguishable hyperplane, or a choice limit isolating individual from one class from the other. If such a hyperplane exists, the work is finished! If such a hyperplane does not exist, SVM utilizes a nonlinear mapping to change the preparation information into a higher measurement. At that point it scans for the straight ideal
  • 12. isolating hyperplane. With a fitting nonlinear mapping to an adequately high measurement, information from two classes can simply be isolated by a hyperplane. The SVM calculation discovers this hyperplane utilizing support vectors and edges. As a preparation calculation, SVM may not be quick contrasted with some other grouping techniques, however inferable from its capacity to display complex nonlinear limits, SVM has high precision. SVM is relatively less inclined to overfitting. SVM has effectively been connected to manually written digit acknowledgment, content arrangement, speaker distinguishing proof and so forth. The utilization of this procedure helped us to recognize the correct closer sum and incentive through relapses and definitions. Results and Findings A correlation of each variable has been made against the wine quality to determine those factors which comparatively have a better influence in the quality of wine. It was found that the top 4 variables that influence the wine quality are as follows :- 1) alcohol 2) sulphates (log10) 3) volatile acidity 4) citric acid The following was done to examine the acidity variables. Of all the other factors, base 10 logarithm TAC.acidity correlated very well with Ph, and rightfully so, since pH is a defining measure of acidity. An interesting question to pose, using basic chemistry knowledge, is to ask what other components other than the measured acids are affecting pH.
  • 13. We can quantify this difference by building a predictive linear model, to predict pH based off of TAC.acidity and capture the % difference as a new variable. Conclusion By examining the above information, we could locate the administered learning strategy called bolster vector machine anticipated the essence of the red wine quality and gave us the outcome for more wine quality is specifically corresponding to the liquor content. Although alternate systems were in the same class as this above technique yet it helped us to discover the guess result and we could foresee the quality through the measure of liquor content. The use of this investigation can comprehend whether by adjusting the factors, it is conceivable to build the nature of the wine available. In the event that you can control your factors, at that point you can foresee the nature of your wine and acquire more benefits. As observed, the direct model and the Support Vector Machine. The SVM performed imperceptibly better and we chose to stay with it on the off chance that we needed to make any more expectations. The use of this investigation, can comprehend whether by altering the factors amid wine making, it is conceivable to expand the nature of the wine available. In the event that you can control your factors, at that point you can anticipate the nature of your wine and acquire more benefits. References CVRVV. 2008. Portuguese Wine — Vinho Verde. Comissão de Viticultura da Região dos Vinhos Verdes (CVRVV), http://www.vinhoverde.pt. P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. V. Cherkassy, Y. Ma. 2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural
  • 14. Networks, 17 (1), pp. 113-126 Red Wine Quality. 2018. Kaggle, https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al- 2009/data Appendix A: (R-Script) Red Wine quality assessment =============================================== ========= ```{r echo=FALSE, message=FALSE, warning=FALSE} # install.packages("MASS") # install.packages("gridExtra") # install.packages("grid") # install.packages("ggplot2") # install.packages("lattice") # install.packages("dplyr") # install.packages("memisc") # install.packages("GGally") # install.packages("reshape2") # install.packages("kernlab") # install.packages("plyr") # install.packages("plotly") # install.packages("e1071") require(MASS) require(gridExtra) library(grid) library(ggplot2) library(lattice) require(dplyr) require(memisc) require(GGally) require(reshape2) require(kernlab) update.packages("ggplot2") library(plyr) library(plotly)
  • 15. library(skimr) ``` Read csv file and explore statistics ```{r echo=FALSE,message=FALSE,warning=FALSE} Wine <- read.csv("https://archive.ics.uci.edu/ml/machine- learning-databases/wine-quality/winequality-red.csv", sep = ";") str(Wine) summary(Wine) skim(Wine) Wine$quality <- as.numeric(Wine$quality) ``` Creates tabular results of categorical variables ```{r,message=FALSE,warning=FALSE} table(Wine$quality) ``` # Univariate Plots Section ```{r echo=FALSE,message=FALSE,warning=FALSE} grid.arrange(qplot(Wine$fixed.acidity), qplot(Wine$volatile.acidity), qplot(Wine$citric.acid), qplot(Wine$residual.sugar), qplot(Wine$chlorides), qplot(Wine$free.sulfur.dioxide), qplot(Wine$total.sulfur.dioxide), qplot(Wine$density), qplot(Wine$pH), qplot(Wine$sulphates), qplot(Wine$alcohol), qplot(Wine$quality), ncol = 4) ``` # Univariate Analysis 1. Wine Quality forms a normal distribution. 2. Density and pH are normally distributed with a few outliers. Create new variable for better exploration
  • 16. ```{r,message=FALSE,warning=FALSE} Wine$rating <- ifelse(Wine$quality < 5, 'bad', ifelse( Wine$quality < 7, 'average', 'good')) Wine$rating <- ordered(Wine$rating, levels = c('bad', 'average', 'good')) summary(Wine$rating) ``` Create Histogram of log function of the variables for further analysis ```{r,message=FALSE,warning=FALSE} ggplot(Wine,aes(x=fixed.acidity))+geom_histogram(fill='red')+s cale_x_log10(breaks=4:15)+ xlab('Fixed Acidity')+ylab('Count')+ggtitle('Histogram of Fixed Acidity Values') require(plotly) ggplot() plot_ly(data=Wine,x=~citric.acid,type='histogram') ggplot(Wine) + geom_histogram(aes(x=volatile.acidity),fill='blue')+ scale_x_log10(breaks=seq(0.1,1,0.1)) ggplot(Wine) + geom_histogram(aes(x=citric.acid),fill='green') + scale_x_log10() ``` Citric acid was one feature that was found to be not normally distributed on a logarithmic scale. Create a combined variable, TAC.acidity, containing the sum of tartaric, acetic, and citric acid. ```{r,message=FALSE,warning=FALSE} Wine$TAC.acidity <- Wine$fixed.acidity + Wine$volatile.acidity + Wine$citric.acid qplot(Wine$TAC.acidity,main = 'Histogram of TAC Acidity (fixed+volatile+Citric)') ```
  • 17. ## Boxplots are better suited in visualizing the outliers. ```{r,message=FALSE,warning=FALSE} get_simple_boxplot <- function(column, ylab) { return(qplot(data = Wine, x = 'simple', y = column, geom = 'boxplot', xlab = '', ylab = ylab)) } grid.arrange(get_simple_boxplot(Wine$fixed.acidity, 'fixed acidity'), get_simple_boxplot(Wine$volatile.acidity, 'volatile acidity'), get_simple_boxplot(Wine$citric.acid, 'citric acid'), get_simple_boxplot(Wine$TAC.acidity, 'TAC acidity'), get_simple_boxplot(Wine$residual.sugar, 'residual sugar'), get_simple_boxplot(Wine$chlorides, 'chlorides'), get_simple_boxplot(Wine$free.sulfur.dioxide, 'free sulf. dioxide'), get_simple_boxplot(Wine$total.sulfur.dioxide, 'total sulf. dioxide'), get_simple_boxplot(Wine$density, 'density'), get_simple_boxplot(Wine$pH, 'pH'), get_simple_boxplot(Wine$sulphates, 'sulphates'), get_simple_boxplot(Wine$alcohol, 'alcohol'), ncol = 4) plot_ly(Wine,y=~alcohol,type='box') ``` # Bivariate Plots Section ```{r echo=FALSE,message=FALSE,warning=FALSE} get_bivariate_boxplot <- function(x, y, ylab) { return(qplot(data = Wine, x = x, y = y, geom = 'boxplot', ylab = ylab)) } grid.arrange(get_bivariate_boxplot(Wine$quality, Wine$fixed.acidity,
  • 18. 'fixed acidity'), get_bivariate_boxplot(Wine$quality, Wine$volatile.acidity, 'volatile acidity'), get_bivariate_boxplot(Wine$quality, Wine$citric.acid, 'citric acid'), get_bivariate_boxplot(Wine$quality, Wine$TAC.acidity, 'TAC acidity'), get_bivariate_boxplot(Wine$quality, log10(Wine$residual.sugar), 'residual sugar'), get_bivariate_boxplot(Wine$quality, log10(Wine$chlorides), 'chlorides'), get_bivariate_boxplot(Wine$quality, Wine$free.sulfur.dioxide, 'free sulf. dioxide'), get_bivariate_boxplot(Wine$quality, Wine$total.sulfur.dioxide, 'total sulf. dioxide'), get_bivariate_boxplot(Wine$quality, Wine$density, 'density'), get_bivariate_boxplot(Wine$quality, Wine$pH, 'pH'), get_bivariate_boxplot(Wine$quality, log10(Wine$sulphates), 'sulphates'), get_bivariate_boxplot(Wine$quality, Wine$alcohol, 'alcohol'), ncol = 4) ``` Correlation for each of these variables against quality: ```{r,message=FALSE,warning=FALSE} simple_cor_test <- function(x, y) {
  • 19. return(cor.test(x, as.numeric(y))$estimate) } correlations <- c( simple_cor_test(Wine$fixed.acidity, Wine$quality), simple_cor_test(Wine$volatile.acidity, Wine$quality), simple_cor_test(Wine$citric.acid, Wine$quality), simple_cor_test(Wine$TAC.acidity, Wine$quality), simple_cor_test(log10(Wine$residual.sugar), Wine$quality), simple_cor_test(log10(Wine$chlorides), Wine$quality), simple_cor_test(Wine$free.sulfur.dioxide, Wine$quality), simple_cor_test(Wine$total.sulfur.dioxide, Wine$quality), simple_cor_test(Wine$density, Wine$quality), simple_cor_test(Wine$pH, Wine$quality), simple_cor_test(log10(Wine$sulphates), Wine$quality), simple_cor_test(Wine$alcohol, Wine$quality)) correlations names(correlations) <- c('fixed.acidity', 'volatile.acidity', 'citric.acid', 'TAC.acidity', 'log10.residual.sugar', 'log10.chlordies', 'free.sulfur.dioxide', 'total.sulfur.dioxide', 'density', 'pH', 'log10.sulphates', 'alcohol') correlations ``` Top 4: alcohol sulphates (log10) volatile acidity citric acid Examining the acidity variables: ```{r,message=FALSE,warning=FALSE} ggplot(data = Wine, aes(x = fixed.acidity, y = citric.acid)) + geom_point(alpha=0.3) cor.test(Wine$fixed.acidity, Wine$citric.acid) ggplot(data = Wine, aes(x = volatile.acidity, y = citric.acid)) + geom_point(alpha=0.3)
  • 20. cor.test(Wine$volatile.acidity, Wine$citric.acid) ggplot(data = Wine, aes(x = log10(TAC.acidity), y = pH)) + geom_point(alpha=0.3) cor.test(log10(Wine$TAC.acidity), Wine$pH) ``` Base 10 logarithm TAC.acidity correlated very well with pH. Building a predictive linear model, to predict pH based off of TAC.acidity and capture the % difference as a new variable. ```{r,message=FALSE,warning=FALSE} m <- lm(I(pH) ~ I(log10(TAC.acidity)), data = Wine) Wine$pH.predictions <- predict(m, Wine) # (observed - expected) / expected Wine$pH.error <- (Wine$pH.predictions - Wine$pH)/Wine$pH ``` To check its accuracy. The RMS Error. ```{r,message=FALSE,warning=FALSE} rmse <- function(error) { sqrt(mean(error^2)) } rmse(m$residuals) #Now, we train a Support Vector Machine. require(e1071) SVM <- svm(I(pH) ~ I(log10(TAC.acidity)), data = Wine) Wine$pH.Predict.SVM <- predict(SVM,Wine) Wine$pH.error.SVM <- (Wine$pH.Predict.SVM - Wine$pH)/Wine$pH rmse(SVM$residuals) ``` SVM functions slightly better than a LM. ### Plot 1: Effect of Alcohol on Wine Quality ```{r echo=FALSE,message=FALSE,warning=FALSE} ggplot(data = Wine, aes(x = quality, y = alcohol,
  • 21. fill = rating)) + geom_boxplot(outlier.color = 'red') + ggtitle('Alcohol Levels in Different Wine Qualities') + xlab('Quality') + ylab('Alcohol (% volume)') ``` ### Description 1 These boxplots demonstrate the effect of alcohol content on wine quality. Generally, higher alcohol content correlated with higher wine quality. However, as the outliers and intervals show, alchol content alone did not produce a higher quality. 13