SlideShare a Scribd company logo
1 of 30
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 1
CHAPTER 1
INTRODUCTION
Statistical learning theory is a framework for machine learning drawing from the
fields of statistics and functional analysis. Statistical learning theory deals with the
problem of finding a predictive function based on data. Statistical learning theory has
led to successful applications in fields such as computer vision, speech
recognition, bioinformatics and baseball.
The goals of learning are understanding and prediction. Learning falls into many
categories, including supervised learning, unsupervised learning, online learning,
and reinforcement learning. From the perspective of statistical learning theory, supervised
learning is best understood. Supervised learning involves learning from a training set of
data. Every point in the training is an input-output pair, where the input maps to an
output. The learning problem consists of inferring the function that maps between the
input and the output, such that the learned function can be used to predict output from
future input.
Depending on the type of output, supervised learning problems are either problems
of regression or problems of classification. If the output takes a continuous range of
values, it is a regression problem. Using Ohm's Law as an example, a regression could be
performed with voltage as input and current as output. The regression would find the
functional relationship between voltage and current.
Classification problems are those for which the output will be an element from a
discrete set of labels. Classification is very common for machine learning applications.
In facial recognition, for instance, a picture of a person's face would be the input, and the
output label would be that person's name. The input would be represented by a large
multidimensional vector whose elements represent pixels in the picture.
After learning a function based on the training set data, that function is validated on
a test set of data, data that did not appear in the training set.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 2
1.1 Supervised Versus Unsupervised Learning:
Most statistical learning problems fall into one of two categories: supervised
Supervised or unsupervised. The examples that we have discussed so far in this capon
supervised ter all fall into the supervised learning domain. For each observation of the
predictor measurement(s) xi, i = 1, . . . , n there is an associated response measurement yi.
We wish to fit a model that relates the response to the predictors, with the aim of
accurately predicting the response for future observations (prediction) or better
understanding the relationship between the response and the predictors (inference). Many
classical statistical learning methods such as linear regression and logistic regression ), as
logistic well as more modern approaches such as GAM, boosting, and support vec-
regression tor machines, operate in the supervised learning domain. The vast majority
of this book is devoted to this setting.
In contrast, unsupervised learning describes the somewhat more challenging situation
in which for every observation i = 1, … n, we observe a vector of measurements xi but no
associated response yi. It is not possible to fit a linear regression model, since there is no
response variable to predict. In this setting, we are in some sense working blind; the
situation is referred to as unsupervised because we lack a response variable that can
supervise our analysis.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 3
1.2 Issues on Statistical Learning:
The main goal of the special issue that we have assembled here is to fill in the above
need, with the focus on the fundamental modeling and learning issues of new emerging
approaches and empirical applications in speech and language processing. Another focus
of this special issue is on the cross-fertilization of learning approaches to speech and
language processing problems. Many problems in speech and language processing share
similarities (despite some conspicuous differences), and techniques in these two fields
can be successfully cross-pollinated.
Our additional goal is to bring together a diverse but complementary set of
contributions on emerging learning methods for speech processing, language processing,
as well as unifying approaches to problems cross cutting these two fields. Discriminative
learning has become a major theme in most areas of speech and language processing.
One of the recent advances in discriminative learning is the integration of the large
margin idea, which is the classical training standard in machine learning, into the
conventional discriminative training criteria for string recognition.
How typical training criteria, such as minimum phone error and maximum mutual
information, can be extended to incorporate the margin concept. In this work, a new
margin-based formalism is proposed for various conventional training criteria.
Experimental results show that the new criteria help the performance across a wide
variety of string recognition scenarios including speech recognition, concept tagging, and
handwriting recognition. In another paper, Cheng et al. explore online learning and
acoustic feature adaptation in large margin hidden Markov models (HMMs), which lead
to a better optimization method for large-margin HMM training. Moving beyond
acoustics, language modeling is one of the essential problems in speech and language
fields. Zhou et al. introduce a novel pseudo-conventional N-gram language model with
discriminative training, and also carry out an empirical study of the robustness of
discriminatively trained LMs. Experimental results show that cumulative performance
improvements can be achieved via this method.
Sequential pattern classification is at the core of many speech and language
processing problems. Conditional random field (CRF) is a widely adopted approach to
supervised sequential labeling.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 4
However, the computational load and model comDigital Object Identifier
10.1109/JSTSP.2010.2086910 plexity grow dramatically when taking complex structure
into account. Here, Sokolovska et al. address this issue through efficient feature selection
based on imposing sparsity through an L1 regularization for CRF. The results show that,
without performance degradation, the L1 regularized CRF results in significantly faster
training and labeling speed, and hence makes it possible to scale up systems to handle
very large dimensional models. Meanwhile, Yu et al. improve the CRF model from
another perspective.
They proposed a multi-layer sequence classification algorithm where each layer is a
CRF, and each higher layer’s input consists of both the previous layer’s observation
sequence and the resulting frame-level marginal probabilities. Compared with the
conventional CRF, the deep-structured CRF achieves superior labeling accuracy on
common tagging tasks. Using the kernel method to improve the performance of
sequential pattern classifiers is also an important direction. Kubo et al. describe a novel
sequential pattern classifier based on kernel methods.
Unlike conventional approaches, they use kernel methods to estimate the emission
probability of HMM, with the extra benefit due to the powerful nonlinear classification
capability of kernel methods. On the other hand, unlike conventional CRF/HMM-based
methods, Bellegarda attacks this problem from a novel angle based on latent semantic
mapping and obtains insightful results.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 5
CHAPTER 2
GETTING STARTED WITH R PROGRAMMING
2.1 Introduction to the R-Studio
R is a free, open-source software and programming language developed in 1995
at the University of Auckland as an environment for statistical computing and graphics
(Ikaha and Gentleman, 1996). Since then R has become one of the dominant software
environments for data analysis and is used by a variety of scientific disiplines, including
soil science, ecology, and geoinformatics (Envirometrics CRAN Task View; Spatial
CRAN Task View). R is particularly popular for its graphical capabilities, but it is also
prized for it’s GIS capabilities which make it relatively easy to generate raster-based
models. More recently, R has also gained several packages which are designed
specifically for analyzing soil data.
2.2 User-interface :
R is a dialect of the S language. It is a case-sensitive, interpreted language. You
can enter commands one at a time at the command prompt (>) or run a set of commands
from a source file. There is a wide variety of data types, including vectors (numerical,
character, logical), matrices, data frames, and lists. Most functionality is provided
through built-in and user-created functions and all data objects are kept in memory during
an interactive session. Basic functions are available by default. Other functions are
contained in packages that can be attached to a current session as needed.
This section describes working with the R interface. A key skill to using R
effectively is learning how to use the built-in help system. Other sections describe the
working environment, inputting programs and outputting results, installing new
functionality through packages, GUIs that have been developed for R, customizing the
environment, producing high quality output, and running programs in batch. A
fundamental design feature of R is that the output from most functions can be used as
input to other functions. This is described in reusing results.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 6
2.3 Basic commands :
 Input and Display:
#read files with labels in first row
read.table(filename,header=TRUE)#read a tab or space delimited
file
read.table(filename,header=TRUE,sep=',')#read csv files
x <-c(1,2,4,8,16)#create a data vector with specified elements
y <-c(1:10)#create a data vector with elements 1-10
n <-10
x1<- c(rnorm(n))#create a n item vector of random normal deviates
y1 <-c(runif(n))+n #create another n
item vector that has n added to each random uniform distribution
z <-rbinom(n,size,prob)#create n samples of size "size" with
probability prob from the binomial
vect<- c(x,y)#combine them into one vector of length 2n
mat<-cbind(x,y)#combine them into a n x 2 matrix
mat[4,2]#display the 4th row and the 2nd column
mat[3,]#display the 3rd row
mat[,2]#display the 2nd column
subset(dataset,logical)#those objects meeting a logical criterion
subset(data.df,select=variables,logical)#get those objects from a
data frame that meet a criterion
data.df[data.df=logical]#yet another way to get a subset
x[order(x$B),]#sort a dataframe by the order of the elements in B
x[rev(order(x$B)),]#sort the dataframe in reverse order
 Moving around
ls()#list the variables in the workspace
rm(x)#remove x from the workspace
rm(list=ls())#remove all the variables from the workspace
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 7
attach(mat)#make the names of the variables in the matrix or data
frame available in the workspace
detach(mat)#releases the names (remember to do this each time you
attach something)
with(mat,....)#a preferred alternative to attach ... detach
new<- old[,-n]#drop the nth column
new<- old[-n,]#drop the nth row
new<- old[,-c(i,j)]#drop the ith and jth column
new<- subset(old,logical)#select those cases that meet the
logical condition
complete <- subset(data.df,complete.cases(data.df))#find those
cases with no missing values
new<- old[n1:n2,n3:n4]#select the n1 through n2 rows of variables
n3 through n4)
 Distributions
beta(a, b)
gamma(x)
choose(n, k)
factorial(x)
dnorm(x, mean=0,sd=1, log = FALSE)#normal distribution
pnorm(q, mean=0,sd=1,lower.tail= TRUE,log.p= FALSE)
qnorm(p, mean=0,sd=1,lower.tail= TRUE,log.p= FALSE)
rnorm(n, mean=0,sd=1)
dunif(x, min=0, max=1, log = FALSE)#uniform distribution
punif(q, min=0, max=1,lower.tail= TRUE,log.p= FALSE)
qunif(p, min=0, max=1,lower.tail= TRUE,log.p= FALSE)
runif(n, min=0, max=1)
 Data manipulation
replace(x, list, values)#remember to assign this to some object
i.e., x <- replace(x,x==-9,NA)
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 8
#similar to the operation x[x==-9] <- NA
scrub(x,where, min, max,isvalue,newvalue)#a convenient way to
change particular values (in psych package)
cut(x, breaks, labels = NULL,
include.lowest= FALSE, right = TRUE,dig.lab=3,...)
x.df<-data.frame(x1,x2,x3...)#combine different kinds of data
into a data frame
as.data.frame()
is.data.frame()
x <-as.matrix()
scale()#converts a data frame to standardized scores
round(x,n)#rounds the values of x to n decimal places
ceiling(x)#vector x of smallest integers > x
floor(x)#vector x of largest interger< x
as.integer(x)#truncates real x to integers (compare to round(x,0)
as.integer(x <cutpoint)#vector x of 0 if less than cutpoint, 1 if
greater than cutpoint)
factor(ifelse(a <cutpoint,"Neg","Pos"))#is another way to
dichotomize and to make a factor for analysis
transform(data.df,variable names = some operation)#can be part of
a set up for a data set
x%in%y#tests each element of x for membership in y
y%in%x#tests each element of y for membership in x
all(x%in%y)#true if x is a proper subset of y
all(x)# for a vector of logical values, are they all true?
any(x)#for a vector of logical values, is at least one true?
2.4 Data Structures in R:
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 9
R programming supports five basic types of data structure namely vector,
matrix, list, data frame and factor. This chapter will discuss these data structures and the
way to write these in R Programming.
1. Vector – This data structures contain similar types of data, i.e., integer, double,
logical, complex, etc. In order to create a vector in R Programming, c() function is
used.
For example,
> x <- 1:7; x[1] 1 2 3 4 5 6 7 > y <- 2:-2; y[1] 2 1 0 -1 -2
2. Matrix – Matrix is a two-dimensional data structure and can be created using
matrix () function. The values for rows columns can be defined using nrow and
ncol arguments. However providing both is not required as other dimension is
automatically taken with the help of length of matrix.
3. List – This data structure includes data of different types. It is similar to vector
but a vector contains similar data but list contains mixed data. A list is created
using list ().
For example, > x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)
>str(x)List of 3$ a: num 2.5$ b: logi TRUE$ c: int [1:3] 1 2 3
4. Dataframe – This data structure is a special case of list where each component is
of same length. Data frame is created using frame() function.
5. For example,
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))
>str(x) # structure of x
'data.frame': 2 obs. of 3 variables:
$ SN :int 1 2
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 10
$ Age :num 21 15
$ Name: Factor w/ 2 levels "Dora","John": 2 1
6. Factor – Factors are used to store predefined and categorical data. It can be
created using factor() function.
For example,
> x <- factor(c("single", "married", "married", "single"));
6. String – Any value written inside a single quote or double quotes is referred to as
String.
For example,
x <- “This is a valid proper ‘ string”
print(x)
y <- ‘this is still valid as this one” double quote is used inside single quotes”
print(y)
Output:
This is a valid proper ‘ string
this is still valid as this single” double quote is used inside single quotes
2.5 Graphics:
The plot() function is the primary way to plot data in R. For instance,plot()plot(x,y)
produces a scatterplot of the numbers in x versus the numbers in y. There are many
additional options that can be passed in to the plot()function. For example, passing in the
argument xlabwill result in a label on the x-axis. To find out more information about the
plot() function, type ?plot.
> x=rnorm (100)
> y=rnorm (100)
>plot(x,y)
>plot(x,y,xlab=" this is the x-axis",ylab=" this is the y-axis", main=" Plot of X vs Y")
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 11
We will often want to save the output of an R plot. The command that we use to do this
will depend on the file type that we would like to create. Forinstance, to create a pdf, we
use the pdf() function, and to create a jpeg, pdf()we use the jpeg() function. jpeg()
>pdf (" Figure .pdf ")
>plot(x,y,col =" green ")
>dev.off () null device
The function dev.off() indicates to R that we are done creating the plot.dev.off()
Alternatively, we can simply copy the plot window and paste it into an appropriate file
type, such as a Word document. The function seq() can be used to create a sequence of
numbers. For seq() instance, seq(a,b) makes a vector of integers between a and b. There
are many other options: for instance, seq(0,1,length=10) makes a sequence of 10 numbers
that are equally spaced between 0 and 1. Typing 3:11 is a shorthand for seq(3,11) for
integer arguments.
> x=seq (1 ,10)> x
[1] 1 2 3 4 5 6 7 8 9 10
> x=1:10
>x
[1] 1 2 3 4 5 6 7 8 9 10
> x=seq(-pi ,pi ,length =50)
We will now create some more sophisticated plots. The contour() funccontour() function
produces a contour plot in order to represent three-dimensional data;contour plotit is like
a topographical map. It takes three arguments:
1. A vector of the x values (the first dimension),
2. A vector of the y values (the second dimension), and
3. A matrix whose elements correspond to the z value (the third dimension) for each pair
of (x,y) coordinates. As with the plot() function, there are many other inputs that can be
used to fine-tune the output of the contour() function. To learn more about these, take a
look at the help file by typing ?contour.
> y=x
> f=outer(x,y,function (x,y)cos(y)/(1+x^2))
>contour (x,y,f)
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 12
>contour (x,y,f,nlevels =45, add=T)
>fa=(f-t(f))/2
>contour (x,y,fa,nlevels =15)
The image() function works the same way as contour(), except that it image()produces a
color-coded plot whose colors depend on the z value.This isknown as a heatmap, and is
sometimes used to plot temperature in weather heatmapforecasts. Alternatively, persp()
can be used to produce a three-dimensional persp()plot. The arguments theta and phi
control the angles at which the plot is viewed.
>image(x,y,fa)
>persp(x,y,fa)
>persp(x,y,fa ,theta =30)
>persp(x,y,fa ,theta =30, phi =20)
>persp(x,y,fa ,theta =30, phi =70)
>persp(x,y,fa ,theta =30, phi =40)
2.6 Reading data into R:
Usually we will be using data already in a file that we need to read into R in order
to work on it. R can read data from a variety of file formats—for example, files created as
text, or in Excel, SPSS or Stata. We will mainly be reading files in text format .txt or .csv
(comma-separated, usually created in Excel).
To read an entire data frame directly, the external file will normally have a special form
 The first line of the file should have a name for each variable in the data frame.
 Each additional line of the file has as its first item a row label and the values for
each variable.
Here we use the example dataset called airquality.csv and airquality.txt
Input file form with names and row labels:
Ozone Solar.R*Wind Temp Month Day
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 13
1 41*****190**7.4**67****5**1
2 36*****118**8.0**72****5**2
3 12*****149*12.6**74****5**3
4 18*****313*11.5**62****5**4
5 NA*****NA**14.3**56****5**5
...
By default numeric items (except row labels) are read as numeric variables. This can be
changed if necessary.
The function read.table() can then be used to read the data frame directly
>airqual<- read.table("C:/Desktop/airquality.txt")
Similarly, to read .csv files the read.csv() function can be used to read in the data
frame directly
[Note: I have noticed that occasionally you'll need to do a double slash in your path //.
This seems to depend on the machine.]
>airqual<- read.csv("C:/Desktop/airquality.csv")
In addition, you can read in files using the file.choose() function in R. After typing in
this command in R, you can manually select the directory and file where your dataset is
located.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 14
CHAPTER 3
LINEAR REGRESSION MODELS
3.1 Linear Regression:
This chapter is about linear regression, a very simple approach for supervised
learning. In particular, linear regression is a useful tool for predicting a quantitative
response. Linear regression has been around for a long time and is the topic of
innumerable textbooks. Though it may seem somewhat dull compared to some of the
more modern statistical learning approaches described in later chapters of this book,
linear regression is still a useful and widely used statistical learning method. Moreover, it
serves as a good jumping-off point for newer approaches: as we will see in later chapters,
many fancy statistical learning approaches can be seen as generalizations or extensions of
linear regression. Consequently, the importance of having a good understanding of linear
regression before studying more complex learning methods cannot be overstated. In this
chapter, we review some of the key ideas underlying the linear regression model, as well
as the least squares approach that is most commonly used to fit this model. Recall the
Advertising data from Chapter 2 sales(in thousands of units) for a particular product as a
function of advertising budgets (in thousands of dollars) for TV, radio, and newspaper
media. Suppose that in our role as statistical consultants we are asked to suggest, on the
basis of this data, a marketing plan for next year that will result in high product sales.
What information would be useful in order to provide such a recommendation? Here are
a few important questions that we might seek to address:
Simple Linear Regression
Simple linear regression lives up to its name: it is a very straightforward simple
linearapproach for predicting a quantitative response Y on the basis of a single predictor
variable X. It assumes that there is approximately a linear relationship between X and Y .
Mathematically, we can write this linear relationship as
Y ≈ β0 + β1XYou might read “≈” as “is approximately modeled as”. We will sometimes
describe by saying that we are regressing Y on X (or Y onto X). For example, X may
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 15
represent TV advertising and Y may represent sales. Then we can regresssales onto TV
by fitting the model sales ≈ β0 + β1 × TV.
In Equation, β0 and β1 are two unknown constants that represent the intercept and slope
terms in the linear model. Together, β0 and β1 areinterceptslope known as the model
coefficients or parameters. Once we have used ourcoefficientparametertraining data to
produce estimates ˆ β0 and ˆβ1 for the model coefficients, wecan predict future sales on
the basis of a particular value of TV advertisingby computingˆy = ˆ β0 + ˆ β1x, where ˆy
indicates a prediction of Y on the basis of X = x. Here we use a hat symbol, ˆ , to denote
the estimated value for an unknown parameter or coefficient, or to denote the predicted
value of the response.
Estimating the Coefficients
In practice, β0 and β1 are unknown. So before we can use to make predictions, we must
use data to estimate the coefficients. Let (x1, y1), (x2, y2), . . . ,(xn, yn) represent
nobservation pairs, each of which consists of a measurement of X and a measurement of
Y . In the Advertising example, this data set consists of the TV advertising budget and
product sales in n = 200 different markets. (Recall that the data are displayed. Our goal is
to obtain coefficient estimates ˆ β0 and ˆ β1 such that the linear model fits the available
data well—that is, so that yi≈ ˆβ0 + ˆ β1xi for i= 1, . . . , n. In other words, we want to find
an intercept ˆ β0 and a slope ˆ β1 such that the resulting line is as close as possible to the
n = 200 data points. There are a number of ways of measuring closeness. However, by far
the most common approach involves minimizing the least squares criterion, least squares
and we take that approach in this chapter.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 16
For the Advertising data, the least squares fit for the regressionof sales onto TV is
shown. The fit is found by minimizing the sum of squared errors. Each grey line segment
represents an error, and the fit makes a compromise by averaging their squares. In this
case a linear fit captures the essence of the relationship, although it is somewhat deficient
in the left of the plot. Let ˆyi= ˆ β0 + ˆ β1xi be the prediction for Y based on the ith value
of X.Then ei= yi−ˆyirepresents the ithresidual—this is the difference betweenresidualthe
ith observed response value and the ith response value that is predictedby our linear
model. We define the residual sum of squares (RSS) asresidual sumof squaresRSS =
e21+ e22+ · · · + e2n, or equivalently asRSS = (y1− ˆβ0− ˆβ1x1)2+(y2− ˆβ0− ˆβ1
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 17
CHAPTER 4
CLASSIFICATION
The linear regression model discussed in Chapter 3 assumes that the response
variable Y is quantitative. But in many situations, the response variable is instead
qualitative. For example, eye color is qualitative, taking on values blue, brown, or green.
Often qualitative variables are referred to as categorical ; we will use these terms
interchangeably.
In this chapter, we study approaches for predicting qualitative responses, a process
that is known as classification. Predicting a qualitative response for an observation can be
referred to as classifying that observation, since it involves assigning the observation to a
category, or class. On the other hand, often the methods used for classification first
predict the probability of each of the categories of a qualitative variable, as the basis for
making the classification. In this sense they also behave like regression methods. There
are many possible classification techniques, or classifiers, that one might use to predict a
qualitative response.
We touched on some of these in Sections 2.1.5 and 2.2.3. In this chapter we discuss
three of the most widely-used classifiers: logistic regression, linear discriminant analysis,
and K-nearest neighbors.
4.1 An Overview of Classification:
Classification problems occur often, perhaps even more so than regression
problems. Some examples include:
1. A person arrives at the emergency room with a set of symptoms that could possibly be
attributed to one of three medical conditions. Which of the three conditions does the
individual have?
2. An online banking service must be able to determine whether or not a transaction being
performed on the site is fraudulent, on the basis of the user’s IP address, past transaction
history, and so forth.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 18
3. On the basis of DNA sequence data for a number of patients with and without a given
disease, a biologist would like to figure out which DNA mutations are deleterious
(disease-causing) and which are not.
Just as in the regression setting, in the classification setting we have a set of training
observations (x1, y1), . . . , (xn, yn) that we can use to build a classifier. We want our
classifier to perform well not only on the training data, but also on test observations that
were not used to train the classifier. In this chapter, we will illustrate the concept of
classification using the simulated Default data set. We are interested in predicting
whether an individual will default on his or her credit card payment, on the basis of
annual income and monthly credit card balance. The data set is displayed in Figure 4.1.
We have plotted annual income and monthly credit card balance for a subset of 10, 000
individuals.
The left-hand panel of displays individuals who defaulted in a given month in
orange, and those who did not in blue. (The overall default rate is about 3 %, so we have
plotted only a fraction of the individuals who did not default.) It appears that individuals
who defaulted tended to have higher credit card balances than those who did not. In the
right-hand panel of Figure 4.1, two pairs of boxplots are shown. The first shows the
distribution of balance split by the binary default variable; the second is a similar plot for
income. In this chapter, we learn how to build a model to predict default (Y ) for any
given value of balance (X1) and income (X2). Since Y is not quantitative, the simple
linear regression model of Chapter 3 is not appropriate.
It is worth noting that Figure 4.1 displays a very pronounced relationship between the
predictor balance and the response default. In most real applications, the relationship
between the predictor and the response will not be nearly so strong. However, for the
sake of illustrating the classification procedures discussed in this chapter, we use an
example in which the relationship between the predictor and the response is somewhat
exaggerated.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 19
FIGURE 4.1. The Default data set. Left: The annual incomes and monthly credit card
balances of a number of individuals. The individuals who defaulted on their credit card
payments are shown in orange, and those who did not are shown in blue. Center:
Boxplots of balance as a function of default status. Right:Boxplots of income as a
function of default status.
4.2 Why Not Linear Regression?
We have stated that linear regression is not appropriate in the case of a qualitative
response. Why not?
Suppose that we are trying to predict the medical condition of a patient in the emergency
room on the basis of her symptoms. In this simplified example, there are three possible
diagnoses: stroke, drug overdose, and epileptic seizure. We could consider encoding
these values as a quantitative response variable, Y , as follows:
Y ={ 1 if stroke;
2 if drug overdose;
3 if epileptic seizure.}
Using this coding, least squares could be used to fit a linear regression model to predict Y
on the basis of a set of predictors X1, . . .,Xp. Unfortunately, this coding implies an
ordering on the outcomes, putting drug overdose in between stroke and epileptic seizure,
and insisting that the difference between stroke and drug overdose is the same as the
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 20
difference between drug overdose and epileptic seizure. In practice there is no particular
reason that this needs to be the case. For instance, one could choose an equally
reasonable coding,
Y ={1 if epileptic seizure;
2 if stroke;
3 if drug overdose.}
which would imply a totally different relationship among the three conditions. Each of
these codings would produce fundamentally different linear models that would ultimately
lead to different sets of predictions on test observations. If the response variable’s values
did take on a natural ordering, such as mild, moderate, and severe, and we felt the gap
between mild and moderate was similar to the gap between moderate and severe, then a
1, 2, 3 coding would be reasonable. Unfortunately, in general there is no natural way to
convert a qualitative response variable with more than two levels into a quantitative
response that is ready for linear regression. For a binary (two level) qualitative response,
the situation is better. For instance, perhaps there are only two possibilities for the
patient’s medical condition: stroke and drug overdose. We could then potentially use the
dummy variable approach from Section 3.3.1 to code the response as follows:
Y ={0 if stroke;
1 if drug overdose.}
4.3 Logistic Regression:
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 21
Classification using the Default data. Left: Estimated probability of default
using linear regression. Some estimated probabilities are negative! The orange ticks
indicate the 0/1 values coded for default(No or Yes). Right:Predicted probabilities of
default using logistic regression. All probabilities lie between 0 and 1.
For the Default data, logistic regression models the probability of default. For example,
the probability of default given balance can be written as
Pr(default = Yes|balance).
The values of Pr(default = Yes|balance), which we abbreviate p(balance), will range
between 0 and 1. Then for any given value of balance, a prediction can be made for
default. For example, one might predict default = Yes for any individual for whom
p(balance) > 0.5. Alternatively, if a company wishes to be conservative in predicting
individuals who are at risk for default, then they may choose to use a lower threshold,
such as p(balance) > 0.1.
4.3.1 The Logistic Model
How should we model the relationship between p(X) = Pr(Y = 1|X) and X? (For
convenience we are using the generic 0/1 coding for the response). In Section 4.2 we
talked of using a linear regression model to represent these probabilities:
p(X) = β0 + β1X.
(4.1)
If we use this approach to predict default=Yes using balance, then weobtain the model
shown in the left-hand panel of Figure 4.2. Here we see the problem with this approach:
for balances close to zero we predict a negative probability of default; if we were to
predict for very large balances, we would get values bigger than 1. These predictions are
not sensible, since of course the true probability of default, regardless of credit card
balance, must fall between 0 and 1. This problem is not unique tothe credit default data.
Any time a straight line is fit to a binary response that is coded as0 or 1, in principle we
can always predict p(X) < 0 for some values of X and p(X) > 1 for others (unless the
range of X is limited).
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 22
To avoid this problem, we must model p(X) using a function that givesoutputs between 0
and 1 for all values of X. Many functions meet this description. In logistic regression, we
use the logistic function,
To fit the model (4.2), we use a method called maximum likelihood, which we discuss in
the next section. The right-hand panel of Figure 4.2 illustrates the fit of the logistic
regression model to the Default data. Notice that for low balances we now predict the
probability of default as close to, but never below, zero. Likewise, for high balances we
predict a default probability close to, but never above, one. The logistic function will
always produce an S-shaped curve of this form, and so regardless of the value of X, we
will obtain a sensible prediction. We also see that the logistic model is better able to
capture the range of probabilities than is the linear regression model in the left-hand plot.
The average fitted probability in both cases is 0.0333 (averaged over the training data),
which is the same as the overall proportion of defaulters in the data set.
4.3.2 Estimating the Regression Coefficients
The coefficients β0 and β1 in (4.2) are unknown, and must be estimated based on the
available training data. In Chapter 3, we used the least squares approach to estimate the
unknown linear regression coefficients. Although we could use (non-linear) least squares
to fit the model (4.4), the more general method of maximum likelihood is preferred, since
it has better statistical properties. The basic intuition behind using maximum likelihood to
fit a logistic regression model is as follows: we seek estimates for β0 and β1 such that the
predicted probability ˆp(xi) of default for each individual, using (4.2), corresponds as
closely as possible to the individual’s observed default status. In other words, we try to
find ˆ β0 and ˆ β1 such that plugging these estimates into the model for p(X), given in
(4.2), yields a number close to one for all individuals who defaulted, and a number close
to zero for all individuals who did not. This intuition can be formalized using a
mathematical equation called a likelihood function:
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 23
The estimates ˆ β0 and ˆβ1 are chosen to maximize this likelihood function. Maximum
likelihood is a very general approach that is used to fit many of the non-linear models that
we examine throughout this book. In the linear regression setting, the least squares
approach is in fact a special case of maximum likelihood. The mathematical details of
maximum likelihood are beyond the scope of this book. However, in general, logistic
regression and other models can be easily fit using a statistical software package such as
R, and so we do not need to concern ourselves with the details of the maximum
likelihood fitting procedure.
4.4 Linear Discriminant Analysis
Logistic regression involves directly modeling Pr(Y = k|X = x) using the logistic
function, given by (4.7) for the case of two response classes. In statistical jargon, we
model the conditional distribution of the response Y , given the predictor(s) X. We now
consider an alternative and less direct approach to estimating these probabilities. In this
alternative approach, we model the distribution of the predictors X separately in each of
the response classes (i.e. given Y ), and then use Bayes’ theorem to flip these around into
estimates for Pr(Y = k|X = x). When these distributions are assumed to be normal, it turns
out that the model is very similar in formto logistic regression.Why do we need another
method, when we have logistic regression? There are several reasons:
 When the classes are well-separated, the parameter estimates for the logistic
regression model are surprisingly unstable. Linear discriminant analysis does not
suffer from this problem.
 If n is small and the distribution of the predictors X is approximately normal in
each of the classes, the linear discriminant model is again more stable than the
logistic regression model.
 As mentioned in Section 4.3.5, linear discriminant analysis is popular when we
have more than two response classes.
4.4.1 Using Bayes’ Theorem for Classification
Suppose that we wish to classify an observation into one of K classes, whereK ≥ 2. In
other words, the qualitative response variable Y can take on Kpossible distinct and
unordered values. Let πk represent the overall or prior probability that a randomly chosen
observation comes from the kth class;this is the probability that a given observation is
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 24
associated with the kthcategory of the response variable Y . Let fk(X) ≡ Pr(X = x|Y = k)
denotethe density function of X for an observation that comes from the kth class.In other
words, fk(x) is relatively large if there is a high probability that an observation in the kth
class has X ≈ x, and fk(x) is small if it is veryunlikely that an observation in the kth class
has X ≈ x. Then Bayes’theorem states that
In accordance with our earlier notation, we will use the abbreviation pk(X) =
Pr(Y = k|X). This suggests that instead of directly computing pk(X) as in Section 4.3.1,
we can simply plug in estimates of πk and fk(X) into (4.10). In general, estimating πk is
easy if we have a random sample of Y s from the population: we simply compute the
fraction of the training observations that belong to the kh class.
.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 25
CHAPTER 5
TREE BASED METHODS
In this chapter, we describe tree-based methods for regression and classification.
These involve stratifying or segmenting the predictor space into a number of simple
regions. In order to make a prediction for a given observation, we typically use the mean
or the mode of the training observations in the region to which it belongs. Since the set of
splitting rules used to segment the predictor space can be summarized in a tree, these
types of approaches are known as decision tree methods.
Tree-based methods are simple and useful for interpretation. However, they typically are
not competitive with the best supervised learning approaches, such as those seen in
Chapters 6 and 7, in terms of prediction accuracy. Hence in this chapter we also introduce
bagging, random forests, and boosting. Each of these approaches involves producing
multiple trees which are then combined to yield a single consensus prediction. We will
see that combining a large number of trees can often result in dramatic improvements in
prediction accuracy, at the expense of some loss in interpretation
5.1 The Basics of Decision Trees:
Decision trees can be applied to both regression and classification problems. We first
consider regression problems, and then move on to classification.For the Hitters data, a
regression tree for predicting the log salary of a baseball player, based on the number of
years that he has played in the major leagues and the number of hits that he made in the
previous year. At a given internal node, the label (of the form Ox<t k) indicates the left-
hand branch emanating from that split, and the right-hand branch corresponds to Ox ≥ tk.
For instance, the split at the top of the tree results in two large branches. The left-hand
branch corresponds to Years<4.5, and the right-hand branch corresponds to Years>=4.5.
The tree has two internal nodes and three terminal nodes, or leaves. The number in each
leaf is the mean of the response for the observations that fall there.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 26
5.1.1 Regression Trees:
In order to motivate regression trees, we begin with a simple exam.
Predicting Baseball Players’ Salaries Using Regression Trees .We use the Hitters data set
to predict a baseball player’s Salary based on Years (the number of years that he has
played in the major leagues) and Hits (the number of hits that he made in the previous
year). We first remove observations that are missing Salary values, and log-transform
Salary so that its distribution has more of a typical bell-shape. (Recall that Salary is
measured in thousands of dollars.) Figure 8.1 shows a regression tree fit to this data. It
consists of a series of splitting rules, starting at the top of the tree. The top split assigns
observations having Years<4.5 to the left branch.1
.
Algorithm 5.1 Building a Regression Tree.
1. Use recursive binary splitting to grow a large tree on the training data, stopping only
when each terminal node has fewer than some minimum number of observations.
2. Apply cost complexity pruning to the large tree in order to obtain a sequence of best
subtrees, as a function of α.
3. Use K-fold cross-validation to choose α. That is, divide the training observations into
K folds. For each k =1,...,K:
(a) Repeat Steps 1 and 2 on all but the kth fold of the training data.
(b) Evaluate the mean squared prediction error on the data in the left-out kth fold, as a
function of α. Average the results for each value of α, and pick α to minimize the average
error.
4. Return the subtree from Step 2 that corresponds to the chosen value of α.
5.1.2 Advantages and Disadvantages of Trees:
Decision trees for regression and classification have a number of advantages over the
more classical approaches seen in Chapters 3 and 4:
▲ Trees are very easy to explain to people. In fact, they are even easier to explain than
linear regression!
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 27
▲ Some people believe that decision trees more closely mirror human decision-making
than do the regression and classification approaches seen in previous chapters.
▲ Trees can be displayed graphically, and are easily interpreted even by a non-expert
(especially if they are small).
▲ Trees can easily handle qualitative predictors without the need to create dummy
variables.
5.2 Bagging, Random Forests, Boosting:
Bagging, random forests, and boosting use trees as building blocks to construct
more powerful prediction models.
5.2.1 Bagging:
The bootstrap, introduced in Chapter 5, is an extremely powerful idea. It is used
in many situations in which it is hard or even impossible to directly compute the standard
deviation of a quantity of interest. We see here that the bootstrap can be used in a
completely different context, in order to improve statistical learning methods such as
decision trees. The decision trees discussed in Section 8.1 suffer from high variance. This
means that if we split the training data into two parts at random, and fit a decision tree to
both halves, the results that we get could be quite different. In contrast, a procedure with
low variance will yield similar results if applied repeatedly to distinct data sets; linear
regression tends to have low variance, if the ratio of n to p is moderately large. Bootstrap
aggregation, orbagging, is a general-purpose procedure for reducing the variance of a
statistical learning method.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 28
It turns out that there is a very straightforward way to estimate the test error of a
bagged model, without the need to perform cross-validation or the validation set
approach. Recall that the key to bagging is that trees are repeatedly fit to bootstrapped
subsets of the observations.
One can show that on average, each bagged tree makes use of around two-thirds
of the observations.3 The remaining one-third of the observations not used to fit a given
bagged tree are referred to as the out-of-bag (OOB) observations. We can predict the
response for the ith observation using each of the trees in which that observation was
OOB. This will yield around B/3 predictions for the ith observation. In order to obtain a
single prediction for the ith observation, we can average these predicted responses (if
regression is the goal) or can take a majority vote (if classification is the goal). This leads
to a single OOB prediction for the ith observation. An OOB prediction can be obtained in
this way for each of the n observations, from which the overall OOB MSE (for a
regression problem) or classification error (for a classification problem) can be computed.
The resulting OOB error is a valid estimate of the test error for the bagged model, since
the response for each observation is predicted using only the trees that were not fit using
that observation. Figure 8.8 displays the OOB error on the Heart data. It can be shown
that with B sufficiently large, OOB error is virtually equivalent to leave-one-out cross-
validation error.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 29
CHAPTER 6
CONCLUSION
 To get familiar with the explosion of “Big Data” problems, statistical learning
machine learning has become a very hot field.
 To learn statistical learning and modeling skills which are in high demand also
cover basic concepts of statistical learning / modeling methods that have
widespread use in business and scientific research.
 To get hands on the applications and the underlying statistical / mathematical
concepts that are relevant to modeling techniques. The course are designed to
familiarize students in implementing the statistical learning methods using the
highly popular statistical software package R.
“Statistical Learning Model using R”
SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 30
CHAPTER 7
REFERENCES
1) An Introduction to Statistical Learning with Applications in R Gareth
James, Daniela Witten, Trevor Hastie and Robert Tibshirani – 6th
edition- Springer Publications.
2) ^ Jump up to:a b c d e f g h i j k l
Saffran, Jenny R. (2003). "Statistical language
learning: mechanisms and constraints". Current Directions in Psychological
Science. 12 (4): 110–114. doi:10.1111/1467-8721.01243.
3) ^ Jump up to:a b
Brent, Michael R.; Cartwright, Timothy A. (1996).
"Distributional regularity and phonotactic constraints are useful for
segmentation". Cognition. 61 (1–2): 93–125. doi:10.1016/S0010-
0277(96)00719-6.
4) ^ Jump up to:a b c d e f g h
Saffran, J. R.; Aslin, R. N.; Newport, E. L. (1996).
"Statistical Learning by 8-Month-Old Infants". Science. 274 (5294): 1926–
1928. doi:10.1126/science.274.5294.1926. PMID 8943209.
5) Jump up^ Saffran, Jenny R.; Newport, Elissa L.; Aslin, Richard N. (1996).
"Word Segmentation: The Role of Distributional Cues". Journal of Memory
and Language. 35 (4): 606–621. doi:10.1006/jmla.1996.0032.
6) Jump up^ Aslin, R. N.; Saffran, J. R.; Newport, E. L. (1998). "Computation of
Conditional Probability Statistics by 8-Month-Old Infants". Psychological
Science. 9 (4): 321–324. doi:10.1111/1467-9280.00063.
7) ^ Jump up to:a b
Saffran, Jenny R (2001a). "Words in a sea of sounds: the
output of infant statistical learning". Cognition. 81 (2): 149–
169. doi:10.1016/S0010-0277(01)00132-9.
8) ^ Jump up to:a b c
Saffran, Jenny R.; Wilson, Diana P. (2003). "From Syllables
to Syntax: Multilevel Statistical Learning by 12-Month-Old
Infants". Infancy. 4 (2): 273–284. doi:10.1207/S15327078IN0402_07.
9) Jump up^ Mattys, Sven L.; Jusczyk, Peter W.; Luce, Paul A.; Morgan,
James L. (1999). "Phonotactic and Prosodic Effects on Word Segmentation
in Infants". Cognitive Psychology. 38 (4): 465–494.

More Related Content

What's hot

Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksUniversity of Nantes
 
A Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
A Low Rank Mechanism to Detect and Achieve Partially Completed Image TagsA Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
A Low Rank Mechanism to Detect and Achieve Partially Completed Image TagsIRJET Journal
 
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...IJECEIAES
 
20051128.doc
20051128.doc20051128.doc
20051128.docbutest
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningNguyen Giang
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Bhagyashree Deokar
 
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...AM Publications
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
 
An Automatic Question Paper Generation : Using Bloom's Taxonomy
An Automatic Question Paper Generation : Using Bloom's   TaxonomyAn Automatic Question Paper Generation : Using Bloom's   Taxonomy
An Automatic Question Paper Generation : Using Bloom's TaxonomyIRJET Journal
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET Journal
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Iffalia R
 
Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015University of Nantes
 
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc23tino
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512IJRAT
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
 

What's hot (18)

Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian Networks
 
A Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
A Low Rank Mechanism to Detect and Achieve Partially Completed Image TagsA Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
A Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
 
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
 
20051128.doc
20051128.doc20051128.doc
20051128.doc
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
 
An Automatic Question Paper Generation : Using Bloom's Taxonomy
An Automatic Question Paper Generation : Using Bloom's   TaxonomyAn Automatic Question Paper Generation : Using Bloom's   Taxonomy
An Automatic Question Paper Generation : Using Bloom's Taxonomy
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
 
Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015
 
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
 

Similar to Audit report[rollno 49]

A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisGina Rizzo
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for RequirementsClément Portet
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answeringAli Kabbadj
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...IRJET Journal
 
LLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdfLLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdfNagaBathula1
 
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...IRJET Journal
 
New Fuzzy Model For quality evaluation of e-Training of CNC Operators
New Fuzzy Model For quality evaluation of e-Training of CNC OperatorsNew Fuzzy Model For quality evaluation of e-Training of CNC Operators
New Fuzzy Model For quality evaluation of e-Training of CNC Operatorsinventionjournals
 
IRJET- Modeling Student’s Vocabulary Knowledge with Natural
IRJET-  	  Modeling Student’s Vocabulary Knowledge with NaturalIRJET-  	  Modeling Student’s Vocabulary Knowledge with Natural
IRJET- Modeling Student’s Vocabulary Knowledge with NaturalIRJET Journal
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxwkyra78
 
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...ijseajournal
 
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...Erin Taylor
 

Similar to Audit report[rollno 49] (20)

A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrency
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
IRJET- Aspect based Sentiment Analysis on Financial Data using Transferred Le...
 
LLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdfLLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdf
 
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
 
New Fuzzy Model For quality evaluation of e-Training of CNC Operators
New Fuzzy Model For quality evaluation of e-Training of CNC OperatorsNew Fuzzy Model For quality evaluation of e-Training of CNC Operators
New Fuzzy Model For quality evaluation of e-Training of CNC Operators
 
IRJET- Modeling Student’s Vocabulary Knowledge with Natural
IRJET-  	  Modeling Student’s Vocabulary Knowledge with NaturalIRJET-  	  Modeling Student’s Vocabulary Knowledge with Natural
IRJET- Modeling Student’s Vocabulary Knowledge with Natural
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
 
Mangai
MangaiMangai
Mangai
 
Mangai
MangaiMangai
Mangai
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
 
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATIONONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
 
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...
ENSEMBLE REGRESSION MODELS FOR SOFTWARE DEVELOPMENT EFFORT ESTIMATION: A COMP...
 
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...
A Flowchart-Based Multi-Agent System for Assisting Novice Programmers with Pr...
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Audit report[rollno 49]

  • 1. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 1 CHAPTER 1 INTRODUCTION Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball. The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input-output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's Law as an example, a regression could be performed with voltage as input and current as output. The regression would find the functional relationship between voltage and current. Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set.
  • 2. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 2 1.1 Supervised Versus Unsupervised Learning: Most statistical learning problems fall into one of two categories: supervised Supervised or unsupervised. The examples that we have discussed so far in this capon supervised ter all fall into the supervised learning domain. For each observation of the predictor measurement(s) xi, i = 1, . . . , n there is an associated response measurement yi. We wish to fit a model that relates the response to the predictors, with the aim of accurately predicting the response for future observations (prediction) or better understanding the relationship between the response and the predictors (inference). Many classical statistical learning methods such as linear regression and logistic regression ), as logistic well as more modern approaches such as GAM, boosting, and support vec- regression tor machines, operate in the supervised learning domain. The vast majority of this book is devoted to this setting. In contrast, unsupervised learning describes the somewhat more challenging situation in which for every observation i = 1, … n, we observe a vector of measurements xi but no associated response yi. It is not possible to fit a linear regression model, since there is no response variable to predict. In this setting, we are in some sense working blind; the situation is referred to as unsupervised because we lack a response variable that can supervise our analysis.
  • 3. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 3 1.2 Issues on Statistical Learning: The main goal of the special issue that we have assembled here is to fill in the above need, with the focus on the fundamental modeling and learning issues of new emerging approaches and empirical applications in speech and language processing. Another focus of this special issue is on the cross-fertilization of learning approaches to speech and language processing problems. Many problems in speech and language processing share similarities (despite some conspicuous differences), and techniques in these two fields can be successfully cross-pollinated. Our additional goal is to bring together a diverse but complementary set of contributions on emerging learning methods for speech processing, language processing, as well as unifying approaches to problems cross cutting these two fields. Discriminative learning has become a major theme in most areas of speech and language processing. One of the recent advances in discriminative learning is the integration of the large margin idea, which is the classical training standard in machine learning, into the conventional discriminative training criteria for string recognition. How typical training criteria, such as minimum phone error and maximum mutual information, can be extended to incorporate the margin concept. In this work, a new margin-based formalism is proposed for various conventional training criteria. Experimental results show that the new criteria help the performance across a wide variety of string recognition scenarios including speech recognition, concept tagging, and handwriting recognition. In another paper, Cheng et al. explore online learning and acoustic feature adaptation in large margin hidden Markov models (HMMs), which lead to a better optimization method for large-margin HMM training. Moving beyond acoustics, language modeling is one of the essential problems in speech and language fields. Zhou et al. introduce a novel pseudo-conventional N-gram language model with discriminative training, and also carry out an empirical study of the robustness of discriminatively trained LMs. Experimental results show that cumulative performance improvements can be achieved via this method. Sequential pattern classification is at the core of many speech and language processing problems. Conditional random field (CRF) is a widely adopted approach to supervised sequential labeling.
  • 4. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 4 However, the computational load and model comDigital Object Identifier 10.1109/JSTSP.2010.2086910 plexity grow dramatically when taking complex structure into account. Here, Sokolovska et al. address this issue through efficient feature selection based on imposing sparsity through an L1 regularization for CRF. The results show that, without performance degradation, the L1 regularized CRF results in significantly faster training and labeling speed, and hence makes it possible to scale up systems to handle very large dimensional models. Meanwhile, Yu et al. improve the CRF model from another perspective. They proposed a multi-layer sequence classification algorithm where each layer is a CRF, and each higher layer’s input consists of both the previous layer’s observation sequence and the resulting frame-level marginal probabilities. Compared with the conventional CRF, the deep-structured CRF achieves superior labeling accuracy on common tagging tasks. Using the kernel method to improve the performance of sequential pattern classifiers is also an important direction. Kubo et al. describe a novel sequential pattern classifier based on kernel methods. Unlike conventional approaches, they use kernel methods to estimate the emission probability of HMM, with the extra benefit due to the powerful nonlinear classification capability of kernel methods. On the other hand, unlike conventional CRF/HMM-based methods, Bellegarda attacks this problem from a novel angle based on latent semantic mapping and obtains insightful results.
  • 5. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 5 CHAPTER 2 GETTING STARTED WITH R PROGRAMMING 2.1 Introduction to the R-Studio R is a free, open-source software and programming language developed in 1995 at the University of Auckland as an environment for statistical computing and graphics (Ikaha and Gentleman, 1996). Since then R has become one of the dominant software environments for data analysis and is used by a variety of scientific disiplines, including soil science, ecology, and geoinformatics (Envirometrics CRAN Task View; Spatial CRAN Task View). R is particularly popular for its graphical capabilities, but it is also prized for it’s GIS capabilities which make it relatively easy to generate raster-based models. More recently, R has also gained several packages which are designed specifically for analyzing soil data. 2.2 User-interface : R is a dialect of the S language. It is a case-sensitive, interpreted language. You can enter commands one at a time at the command prompt (>) or run a set of commands from a source file. There is a wide variety of data types, including vectors (numerical, character, logical), matrices, data frames, and lists. Most functionality is provided through built-in and user-created functions and all data objects are kept in memory during an interactive session. Basic functions are available by default. Other functions are contained in packages that can be attached to a current session as needed. This section describes working with the R interface. A key skill to using R effectively is learning how to use the built-in help system. Other sections describe the working environment, inputting programs and outputting results, installing new functionality through packages, GUIs that have been developed for R, customizing the environment, producing high quality output, and running programs in batch. A fundamental design feature of R is that the output from most functions can be used as input to other functions. This is described in reusing results.
  • 6. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 6 2.3 Basic commands :  Input and Display: #read files with labels in first row read.table(filename,header=TRUE)#read a tab or space delimited file read.table(filename,header=TRUE,sep=',')#read csv files x <-c(1,2,4,8,16)#create a data vector with specified elements y <-c(1:10)#create a data vector with elements 1-10 n <-10 x1<- c(rnorm(n))#create a n item vector of random normal deviates y1 <-c(runif(n))+n #create another n item vector that has n added to each random uniform distribution z <-rbinom(n,size,prob)#create n samples of size "size" with probability prob from the binomial vect<- c(x,y)#combine them into one vector of length 2n mat<-cbind(x,y)#combine them into a n x 2 matrix mat[4,2]#display the 4th row and the 2nd column mat[3,]#display the 3rd row mat[,2]#display the 2nd column subset(dataset,logical)#those objects meeting a logical criterion subset(data.df,select=variables,logical)#get those objects from a data frame that meet a criterion data.df[data.df=logical]#yet another way to get a subset x[order(x$B),]#sort a dataframe by the order of the elements in B x[rev(order(x$B)),]#sort the dataframe in reverse order  Moving around ls()#list the variables in the workspace rm(x)#remove x from the workspace rm(list=ls())#remove all the variables from the workspace
  • 7. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 7 attach(mat)#make the names of the variables in the matrix or data frame available in the workspace detach(mat)#releases the names (remember to do this each time you attach something) with(mat,....)#a preferred alternative to attach ... detach new<- old[,-n]#drop the nth column new<- old[-n,]#drop the nth row new<- old[,-c(i,j)]#drop the ith and jth column new<- subset(old,logical)#select those cases that meet the logical condition complete <- subset(data.df,complete.cases(data.df))#find those cases with no missing values new<- old[n1:n2,n3:n4]#select the n1 through n2 rows of variables n3 through n4)  Distributions beta(a, b) gamma(x) choose(n, k) factorial(x) dnorm(x, mean=0,sd=1, log = FALSE)#normal distribution pnorm(q, mean=0,sd=1,lower.tail= TRUE,log.p= FALSE) qnorm(p, mean=0,sd=1,lower.tail= TRUE,log.p= FALSE) rnorm(n, mean=0,sd=1) dunif(x, min=0, max=1, log = FALSE)#uniform distribution punif(q, min=0, max=1,lower.tail= TRUE,log.p= FALSE) qunif(p, min=0, max=1,lower.tail= TRUE,log.p= FALSE) runif(n, min=0, max=1)  Data manipulation replace(x, list, values)#remember to assign this to some object i.e., x <- replace(x,x==-9,NA)
  • 8. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 8 #similar to the operation x[x==-9] <- NA scrub(x,where, min, max,isvalue,newvalue)#a convenient way to change particular values (in psych package) cut(x, breaks, labels = NULL, include.lowest= FALSE, right = TRUE,dig.lab=3,...) x.df<-data.frame(x1,x2,x3...)#combine different kinds of data into a data frame as.data.frame() is.data.frame() x <-as.matrix() scale()#converts a data frame to standardized scores round(x,n)#rounds the values of x to n decimal places ceiling(x)#vector x of smallest integers > x floor(x)#vector x of largest interger< x as.integer(x)#truncates real x to integers (compare to round(x,0) as.integer(x <cutpoint)#vector x of 0 if less than cutpoint, 1 if greater than cutpoint) factor(ifelse(a <cutpoint,"Neg","Pos"))#is another way to dichotomize and to make a factor for analysis transform(data.df,variable names = some operation)#can be part of a set up for a data set x%in%y#tests each element of x for membership in y y%in%x#tests each element of y for membership in x all(x%in%y)#true if x is a proper subset of y all(x)# for a vector of logical values, are they all true? any(x)#for a vector of logical values, is at least one true? 2.4 Data Structures in R:
  • 9. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 9 R programming supports five basic types of data structure namely vector, matrix, list, data frame and factor. This chapter will discuss these data structures and the way to write these in R Programming. 1. Vector – This data structures contain similar types of data, i.e., integer, double, logical, complex, etc. In order to create a vector in R Programming, c() function is used. For example, > x <- 1:7; x[1] 1 2 3 4 5 6 7 > y <- 2:-2; y[1] 2 1 0 -1 -2 2. Matrix – Matrix is a two-dimensional data structure and can be created using matrix () function. The values for rows columns can be defined using nrow and ncol arguments. However providing both is not required as other dimension is automatically taken with the help of length of matrix. 3. List – This data structure includes data of different types. It is similar to vector but a vector contains similar data but list contains mixed data. A list is created using list (). For example, > x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3) >str(x)List of 3$ a: num 2.5$ b: logi TRUE$ c: int [1:3] 1 2 3 4. Dataframe – This data structure is a special case of list where each component is of same length. Data frame is created using frame() function. 5. For example, > x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora")) >str(x) # structure of x 'data.frame': 2 obs. of 3 variables: $ SN :int 1 2
  • 10. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 10 $ Age :num 21 15 $ Name: Factor w/ 2 levels "Dora","John": 2 1 6. Factor – Factors are used to store predefined and categorical data. It can be created using factor() function. For example, > x <- factor(c("single", "married", "married", "single")); 6. String – Any value written inside a single quote or double quotes is referred to as String. For example, x <- “This is a valid proper ‘ string” print(x) y <- ‘this is still valid as this one” double quote is used inside single quotes” print(y) Output: This is a valid proper ‘ string this is still valid as this single” double quote is used inside single quotes 2.5 Graphics: The plot() function is the primary way to plot data in R. For instance,plot()plot(x,y) produces a scatterplot of the numbers in x versus the numbers in y. There are many additional options that can be passed in to the plot()function. For example, passing in the argument xlabwill result in a label on the x-axis. To find out more information about the plot() function, type ?plot. > x=rnorm (100) > y=rnorm (100) >plot(x,y) >plot(x,y,xlab=" this is the x-axis",ylab=" this is the y-axis", main=" Plot of X vs Y")
  • 11. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 11 We will often want to save the output of an R plot. The command that we use to do this will depend on the file type that we would like to create. Forinstance, to create a pdf, we use the pdf() function, and to create a jpeg, pdf()we use the jpeg() function. jpeg() >pdf (" Figure .pdf ") >plot(x,y,col =" green ") >dev.off () null device The function dev.off() indicates to R that we are done creating the plot.dev.off() Alternatively, we can simply copy the plot window and paste it into an appropriate file type, such as a Word document. The function seq() can be used to create a sequence of numbers. For seq() instance, seq(a,b) makes a vector of integers between a and b. There are many other options: for instance, seq(0,1,length=10) makes a sequence of 10 numbers that are equally spaced between 0 and 1. Typing 3:11 is a shorthand for seq(3,11) for integer arguments. > x=seq (1 ,10)> x [1] 1 2 3 4 5 6 7 8 9 10 > x=1:10 >x [1] 1 2 3 4 5 6 7 8 9 10 > x=seq(-pi ,pi ,length =50) We will now create some more sophisticated plots. The contour() funccontour() function produces a contour plot in order to represent three-dimensional data;contour plotit is like a topographical map. It takes three arguments: 1. A vector of the x values (the first dimension), 2. A vector of the y values (the second dimension), and 3. A matrix whose elements correspond to the z value (the third dimension) for each pair of (x,y) coordinates. As with the plot() function, there are many other inputs that can be used to fine-tune the output of the contour() function. To learn more about these, take a look at the help file by typing ?contour. > y=x > f=outer(x,y,function (x,y)cos(y)/(1+x^2)) >contour (x,y,f)
  • 12. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 12 >contour (x,y,f,nlevels =45, add=T) >fa=(f-t(f))/2 >contour (x,y,fa,nlevels =15) The image() function works the same way as contour(), except that it image()produces a color-coded plot whose colors depend on the z value.This isknown as a heatmap, and is sometimes used to plot temperature in weather heatmapforecasts. Alternatively, persp() can be used to produce a three-dimensional persp()plot. The arguments theta and phi control the angles at which the plot is viewed. >image(x,y,fa) >persp(x,y,fa) >persp(x,y,fa ,theta =30) >persp(x,y,fa ,theta =30, phi =20) >persp(x,y,fa ,theta =30, phi =70) >persp(x,y,fa ,theta =30, phi =40) 2.6 Reading data into R: Usually we will be using data already in a file that we need to read into R in order to work on it. R can read data from a variety of file formats—for example, files created as text, or in Excel, SPSS or Stata. We will mainly be reading files in text format .txt or .csv (comma-separated, usually created in Excel). To read an entire data frame directly, the external file will normally have a special form  The first line of the file should have a name for each variable in the data frame.  Each additional line of the file has as its first item a row label and the values for each variable. Here we use the example dataset called airquality.csv and airquality.txt Input file form with names and row labels: Ozone Solar.R*Wind Temp Month Day
  • 13. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 13 1 41*****190**7.4**67****5**1 2 36*****118**8.0**72****5**2 3 12*****149*12.6**74****5**3 4 18*****313*11.5**62****5**4 5 NA*****NA**14.3**56****5**5 ... By default numeric items (except row labels) are read as numeric variables. This can be changed if necessary. The function read.table() can then be used to read the data frame directly >airqual<- read.table("C:/Desktop/airquality.txt") Similarly, to read .csv files the read.csv() function can be used to read in the data frame directly [Note: I have noticed that occasionally you'll need to do a double slash in your path //. This seems to depend on the machine.] >airqual<- read.csv("C:/Desktop/airquality.csv") In addition, you can read in files using the file.choose() function in R. After typing in this command in R, you can manually select the directory and file where your dataset is located.
  • 14. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 14 CHAPTER 3 LINEAR REGRESSION MODELS 3.1 Linear Regression: This chapter is about linear regression, a very simple approach for supervised learning. In particular, linear regression is a useful tool for predicting a quantitative response. Linear regression has been around for a long time and is the topic of innumerable textbooks. Though it may seem somewhat dull compared to some of the more modern statistical learning approaches described in later chapters of this book, linear regression is still a useful and widely used statistical learning method. Moreover, it serves as a good jumping-off point for newer approaches: as we will see in later chapters, many fancy statistical learning approaches can be seen as generalizations or extensions of linear regression. Consequently, the importance of having a good understanding of linear regression before studying more complex learning methods cannot be overstated. In this chapter, we review some of the key ideas underlying the linear regression model, as well as the least squares approach that is most commonly used to fit this model. Recall the Advertising data from Chapter 2 sales(in thousands of units) for a particular product as a function of advertising budgets (in thousands of dollars) for TV, radio, and newspaper media. Suppose that in our role as statistical consultants we are asked to suggest, on the basis of this data, a marketing plan for next year that will result in high product sales. What information would be useful in order to provide such a recommendation? Here are a few important questions that we might seek to address: Simple Linear Regression Simple linear regression lives up to its name: it is a very straightforward simple linearapproach for predicting a quantitative response Y on the basis of a single predictor variable X. It assumes that there is approximately a linear relationship between X and Y . Mathematically, we can write this linear relationship as Y ≈ β0 + β1XYou might read “≈” as “is approximately modeled as”. We will sometimes describe by saying that we are regressing Y on X (or Y onto X). For example, X may
  • 15. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 15 represent TV advertising and Y may represent sales. Then we can regresssales onto TV by fitting the model sales ≈ β0 + β1 × TV. In Equation, β0 and β1 are two unknown constants that represent the intercept and slope terms in the linear model. Together, β0 and β1 areinterceptslope known as the model coefficients or parameters. Once we have used ourcoefficientparametertraining data to produce estimates ˆ β0 and ˆβ1 for the model coefficients, wecan predict future sales on the basis of a particular value of TV advertisingby computingˆy = ˆ β0 + ˆ β1x, where ˆy indicates a prediction of Y on the basis of X = x. Here we use a hat symbol, ˆ , to denote the estimated value for an unknown parameter or coefficient, or to denote the predicted value of the response. Estimating the Coefficients In practice, β0 and β1 are unknown. So before we can use to make predictions, we must use data to estimate the coefficients. Let (x1, y1), (x2, y2), . . . ,(xn, yn) represent nobservation pairs, each of which consists of a measurement of X and a measurement of Y . In the Advertising example, this data set consists of the TV advertising budget and product sales in n = 200 different markets. (Recall that the data are displayed. Our goal is to obtain coefficient estimates ˆ β0 and ˆ β1 such that the linear model fits the available data well—that is, so that yi≈ ˆβ0 + ˆ β1xi for i= 1, . . . , n. In other words, we want to find an intercept ˆ β0 and a slope ˆ β1 such that the resulting line is as close as possible to the n = 200 data points. There are a number of ways of measuring closeness. However, by far the most common approach involves minimizing the least squares criterion, least squares and we take that approach in this chapter.
  • 16. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 16 For the Advertising data, the least squares fit for the regressionof sales onto TV is shown. The fit is found by minimizing the sum of squared errors. Each grey line segment represents an error, and the fit makes a compromise by averaging their squares. In this case a linear fit captures the essence of the relationship, although it is somewhat deficient in the left of the plot. Let ˆyi= ˆ β0 + ˆ β1xi be the prediction for Y based on the ith value of X.Then ei= yi−ˆyirepresents the ithresidual—this is the difference betweenresidualthe ith observed response value and the ith response value that is predictedby our linear model. We define the residual sum of squares (RSS) asresidual sumof squaresRSS = e21+ e22+ · · · + e2n, or equivalently asRSS = (y1− ˆβ0− ˆβ1x1)2+(y2− ˆβ0− ˆβ1
  • 17. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 17 CHAPTER 4 CLASSIFICATION The linear regression model discussed in Chapter 3 assumes that the response variable Y is quantitative. But in many situations, the response variable is instead qualitative. For example, eye color is qualitative, taking on values blue, brown, or green. Often qualitative variables are referred to as categorical ; we will use these terms interchangeably. In this chapter, we study approaches for predicting qualitative responses, a process that is known as classification. Predicting a qualitative response for an observation can be referred to as classifying that observation, since it involves assigning the observation to a category, or class. On the other hand, often the methods used for classification first predict the probability of each of the categories of a qualitative variable, as the basis for making the classification. In this sense they also behave like regression methods. There are many possible classification techniques, or classifiers, that one might use to predict a qualitative response. We touched on some of these in Sections 2.1.5 and 2.2.3. In this chapter we discuss three of the most widely-used classifiers: logistic regression, linear discriminant analysis, and K-nearest neighbors. 4.1 An Overview of Classification: Classification problems occur often, perhaps even more so than regression problems. Some examples include: 1. A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have? 2. An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth.
  • 18. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 18 3. On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not. Just as in the regression setting, in the classification setting we have a set of training observations (x1, y1), . . . , (xn, yn) that we can use to build a classifier. We want our classifier to perform well not only on the training data, but also on test observations that were not used to train the classifier. In this chapter, we will illustrate the concept of classification using the simulated Default data set. We are interested in predicting whether an individual will default on his or her credit card payment, on the basis of annual income and monthly credit card balance. The data set is displayed in Figure 4.1. We have plotted annual income and monthly credit card balance for a subset of 10, 000 individuals. The left-hand panel of displays individuals who defaulted in a given month in orange, and those who did not in blue. (The overall default rate is about 3 %, so we have plotted only a fraction of the individuals who did not default.) It appears that individuals who defaulted tended to have higher credit card balances than those who did not. In the right-hand panel of Figure 4.1, two pairs of boxplots are shown. The first shows the distribution of balance split by the binary default variable; the second is a similar plot for income. In this chapter, we learn how to build a model to predict default (Y ) for any given value of balance (X1) and income (X2). Since Y is not quantitative, the simple linear regression model of Chapter 3 is not appropriate. It is worth noting that Figure 4.1 displays a very pronounced relationship between the predictor balance and the response default. In most real applications, the relationship between the predictor and the response will not be nearly so strong. However, for the sake of illustrating the classification procedures discussed in this chapter, we use an example in which the relationship between the predictor and the response is somewhat exaggerated.
  • 19. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 19 FIGURE 4.1. The Default data set. Left: The annual incomes and monthly credit card balances of a number of individuals. The individuals who defaulted on their credit card payments are shown in orange, and those who did not are shown in blue. Center: Boxplots of balance as a function of default status. Right:Boxplots of income as a function of default status. 4.2 Why Not Linear Regression? We have stated that linear regression is not appropriate in the case of a qualitative response. Why not? Suppose that we are trying to predict the medical condition of a patient in the emergency room on the basis of her symptoms. In this simplified example, there are three possible diagnoses: stroke, drug overdose, and epileptic seizure. We could consider encoding these values as a quantitative response variable, Y , as follows: Y ={ 1 if stroke; 2 if drug overdose; 3 if epileptic seizure.} Using this coding, least squares could be used to fit a linear regression model to predict Y on the basis of a set of predictors X1, . . .,Xp. Unfortunately, this coding implies an ordering on the outcomes, putting drug overdose in between stroke and epileptic seizure, and insisting that the difference between stroke and drug overdose is the same as the
  • 20. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 20 difference between drug overdose and epileptic seizure. In practice there is no particular reason that this needs to be the case. For instance, one could choose an equally reasonable coding, Y ={1 if epileptic seizure; 2 if stroke; 3 if drug overdose.} which would imply a totally different relationship among the three conditions. Each of these codings would produce fundamentally different linear models that would ultimately lead to different sets of predictions on test observations. If the response variable’s values did take on a natural ordering, such as mild, moderate, and severe, and we felt the gap between mild and moderate was similar to the gap between moderate and severe, then a 1, 2, 3 coding would be reasonable. Unfortunately, in general there is no natural way to convert a qualitative response variable with more than two levels into a quantitative response that is ready for linear regression. For a binary (two level) qualitative response, the situation is better. For instance, perhaps there are only two possibilities for the patient’s medical condition: stroke and drug overdose. We could then potentially use the dummy variable approach from Section 3.3.1 to code the response as follows: Y ={0 if stroke; 1 if drug overdose.} 4.3 Logistic Regression:
  • 21. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 21 Classification using the Default data. Left: Estimated probability of default using linear regression. Some estimated probabilities are negative! The orange ticks indicate the 0/1 values coded for default(No or Yes). Right:Predicted probabilities of default using logistic regression. All probabilities lie between 0 and 1. For the Default data, logistic regression models the probability of default. For example, the probability of default given balance can be written as Pr(default = Yes|balance). The values of Pr(default = Yes|balance), which we abbreviate p(balance), will range between 0 and 1. Then for any given value of balance, a prediction can be made for default. For example, one might predict default = Yes for any individual for whom p(balance) > 0.5. Alternatively, if a company wishes to be conservative in predicting individuals who are at risk for default, then they may choose to use a lower threshold, such as p(balance) > 0.1. 4.3.1 The Logistic Model How should we model the relationship between p(X) = Pr(Y = 1|X) and X? (For convenience we are using the generic 0/1 coding for the response). In Section 4.2 we talked of using a linear regression model to represent these probabilities: p(X) = β0 + β1X. (4.1) If we use this approach to predict default=Yes using balance, then weobtain the model shown in the left-hand panel of Figure 4.2. Here we see the problem with this approach: for balances close to zero we predict a negative probability of default; if we were to predict for very large balances, we would get values bigger than 1. These predictions are not sensible, since of course the true probability of default, regardless of credit card balance, must fall between 0 and 1. This problem is not unique tothe credit default data. Any time a straight line is fit to a binary response that is coded as0 or 1, in principle we can always predict p(X) < 0 for some values of X and p(X) > 1 for others (unless the range of X is limited).
  • 22. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 22 To avoid this problem, we must model p(X) using a function that givesoutputs between 0 and 1 for all values of X. Many functions meet this description. In logistic regression, we use the logistic function, To fit the model (4.2), we use a method called maximum likelihood, which we discuss in the next section. The right-hand panel of Figure 4.2 illustrates the fit of the logistic regression model to the Default data. Notice that for low balances we now predict the probability of default as close to, but never below, zero. Likewise, for high balances we predict a default probability close to, but never above, one. The logistic function will always produce an S-shaped curve of this form, and so regardless of the value of X, we will obtain a sensible prediction. We also see that the logistic model is better able to capture the range of probabilities than is the linear regression model in the left-hand plot. The average fitted probability in both cases is 0.0333 (averaged over the training data), which is the same as the overall proportion of defaulters in the data set. 4.3.2 Estimating the Regression Coefficients The coefficients β0 and β1 in (4.2) are unknown, and must be estimated based on the available training data. In Chapter 3, we used the least squares approach to estimate the unknown linear regression coefficients. Although we could use (non-linear) least squares to fit the model (4.4), the more general method of maximum likelihood is preferred, since it has better statistical properties. The basic intuition behind using maximum likelihood to fit a logistic regression model is as follows: we seek estimates for β0 and β1 such that the predicted probability ˆp(xi) of default for each individual, using (4.2), corresponds as closely as possible to the individual’s observed default status. In other words, we try to find ˆ β0 and ˆ β1 such that plugging these estimates into the model for p(X), given in (4.2), yields a number close to one for all individuals who defaulted, and a number close to zero for all individuals who did not. This intuition can be formalized using a mathematical equation called a likelihood function:
  • 23. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 23 The estimates ˆ β0 and ˆβ1 are chosen to maximize this likelihood function. Maximum likelihood is a very general approach that is used to fit many of the non-linear models that we examine throughout this book. In the linear regression setting, the least squares approach is in fact a special case of maximum likelihood. The mathematical details of maximum likelihood are beyond the scope of this book. However, in general, logistic regression and other models can be easily fit using a statistical software package such as R, and so we do not need to concern ourselves with the details of the maximum likelihood fitting procedure. 4.4 Linear Discriminant Analysis Logistic regression involves directly modeling Pr(Y = k|X = x) using the logistic function, given by (4.7) for the case of two response classes. In statistical jargon, we model the conditional distribution of the response Y , given the predictor(s) X. We now consider an alternative and less direct approach to estimating these probabilities. In this alternative approach, we model the distribution of the predictors X separately in each of the response classes (i.e. given Y ), and then use Bayes’ theorem to flip these around into estimates for Pr(Y = k|X = x). When these distributions are assumed to be normal, it turns out that the model is very similar in formto logistic regression.Why do we need another method, when we have logistic regression? There are several reasons:  When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem.  If n is small and the distribution of the predictors X is approximately normal in each of the classes, the linear discriminant model is again more stable than the logistic regression model.  As mentioned in Section 4.3.5, linear discriminant analysis is popular when we have more than two response classes. 4.4.1 Using Bayes’ Theorem for Classification Suppose that we wish to classify an observation into one of K classes, whereK ≥ 2. In other words, the qualitative response variable Y can take on Kpossible distinct and unordered values. Let πk represent the overall or prior probability that a randomly chosen observation comes from the kth class;this is the probability that a given observation is
  • 24. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 24 associated with the kthcategory of the response variable Y . Let fk(X) ≡ Pr(X = x|Y = k) denotethe density function of X for an observation that comes from the kth class.In other words, fk(x) is relatively large if there is a high probability that an observation in the kth class has X ≈ x, and fk(x) is small if it is veryunlikely that an observation in the kth class has X ≈ x. Then Bayes’theorem states that In accordance with our earlier notation, we will use the abbreviation pk(X) = Pr(Y = k|X). This suggests that instead of directly computing pk(X) as in Section 4.3.1, we can simply plug in estimates of πk and fk(X) into (4.10). In general, estimating πk is easy if we have a random sample of Y s from the population: we simply compute the fraction of the training observations that belong to the kh class. .
  • 25. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 25 CHAPTER 5 TREE BASED METHODS In this chapter, we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. In order to make a prediction for a given observation, we typically use the mean or the mode of the training observations in the region to which it belongs. Since the set of splitting rules used to segment the predictor space can be summarized in a tree, these types of approaches are known as decision tree methods. Tree-based methods are simple and useful for interpretation. However, they typically are not competitive with the best supervised learning approaches, such as those seen in Chapters 6 and 7, in terms of prediction accuracy. Hence in this chapter we also introduce bagging, random forests, and boosting. Each of these approaches involves producing multiple trees which are then combined to yield a single consensus prediction. We will see that combining a large number of trees can often result in dramatic improvements in prediction accuracy, at the expense of some loss in interpretation 5.1 The Basics of Decision Trees: Decision trees can be applied to both regression and classification problems. We first consider regression problems, and then move on to classification.For the Hitters data, a regression tree for predicting the log salary of a baseball player, based on the number of years that he has played in the major leagues and the number of hits that he made in the previous year. At a given internal node, the label (of the form Ox<t k) indicates the left- hand branch emanating from that split, and the right-hand branch corresponds to Ox ≥ tk. For instance, the split at the top of the tree results in two large branches. The left-hand branch corresponds to Years<4.5, and the right-hand branch corresponds to Years>=4.5. The tree has two internal nodes and three terminal nodes, or leaves. The number in each leaf is the mean of the response for the observations that fall there.
  • 26. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 26 5.1.1 Regression Trees: In order to motivate regression trees, we begin with a simple exam. Predicting Baseball Players’ Salaries Using Regression Trees .We use the Hitters data set to predict a baseball player’s Salary based on Years (the number of years that he has played in the major leagues) and Hits (the number of hits that he made in the previous year). We first remove observations that are missing Salary values, and log-transform Salary so that its distribution has more of a typical bell-shape. (Recall that Salary is measured in thousands of dollars.) Figure 8.1 shows a regression tree fit to this data. It consists of a series of splitting rules, starting at the top of the tree. The top split assigns observations having Years<4.5 to the left branch.1 . Algorithm 5.1 Building a Regression Tree. 1. Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations. 2. Apply cost complexity pruning to the large tree in order to obtain a sequence of best subtrees, as a function of α. 3. Use K-fold cross-validation to choose α. That is, divide the training observations into K folds. For each k =1,...,K: (a) Repeat Steps 1 and 2 on all but the kth fold of the training data. (b) Evaluate the mean squared prediction error on the data in the left-out kth fold, as a function of α. Average the results for each value of α, and pick α to minimize the average error. 4. Return the subtree from Step 2 that corresponds to the chosen value of α. 5.1.2 Advantages and Disadvantages of Trees: Decision trees for regression and classification have a number of advantages over the more classical approaches seen in Chapters 3 and 4: ▲ Trees are very easy to explain to people. In fact, they are even easier to explain than linear regression!
  • 27. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 27 ▲ Some people believe that decision trees more closely mirror human decision-making than do the regression and classification approaches seen in previous chapters. ▲ Trees can be displayed graphically, and are easily interpreted even by a non-expert (especially if they are small). ▲ Trees can easily handle qualitative predictors without the need to create dummy variables. 5.2 Bagging, Random Forests, Boosting: Bagging, random forests, and boosting use trees as building blocks to construct more powerful prediction models. 5.2.1 Bagging: The bootstrap, introduced in Chapter 5, is an extremely powerful idea. It is used in many situations in which it is hard or even impossible to directly compute the standard deviation of a quantity of interest. We see here that the bootstrap can be used in a completely different context, in order to improve statistical learning methods such as decision trees. The decision trees discussed in Section 8.1 suffer from high variance. This means that if we split the training data into two parts at random, and fit a decision tree to both halves, the results that we get could be quite different. In contrast, a procedure with low variance will yield similar results if applied repeatedly to distinct data sets; linear regression tends to have low variance, if the ratio of n to p is moderately large. Bootstrap aggregation, orbagging, is a general-purpose procedure for reducing the variance of a statistical learning method.
  • 28. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 28 It turns out that there is a very straightforward way to estimate the test error of a bagged model, without the need to perform cross-validation or the validation set approach. Recall that the key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations. One can show that on average, each bagged tree makes use of around two-thirds of the observations.3 The remaining one-third of the observations not used to fit a given bagged tree are referred to as the out-of-bag (OOB) observations. We can predict the response for the ith observation using each of the trees in which that observation was OOB. This will yield around B/3 predictions for the ith observation. In order to obtain a single prediction for the ith observation, we can average these predicted responses (if regression is the goal) or can take a majority vote (if classification is the goal). This leads to a single OOB prediction for the ith observation. An OOB prediction can be obtained in this way for each of the n observations, from which the overall OOB MSE (for a regression problem) or classification error (for a classification problem) can be computed. The resulting OOB error is a valid estimate of the test error for the bagged model, since the response for each observation is predicted using only the trees that were not fit using that observation. Figure 8.8 displays the OOB error on the Heart data. It can be shown that with B sufficiently large, OOB error is virtually equivalent to leave-one-out cross- validation error.
  • 29. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 29 CHAPTER 6 CONCLUSION  To get familiar with the explosion of “Big Data” problems, statistical learning machine learning has become a very hot field.  To learn statistical learning and modeling skills which are in high demand also cover basic concepts of statistical learning / modeling methods that have widespread use in business and scientific research.  To get hands on the applications and the underlying statistical / mathematical concepts that are relevant to modeling techniques. The course are designed to familiarize students in implementing the statistical learning methods using the highly popular statistical software package R.
  • 30. “Statistical Learning Model using R” SRES’s SANJIVANI COLLEGE OFENGINEERING, KOPARGAON[IT]2018-2019 Page 30 CHAPTER 7 REFERENCES 1) An Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani – 6th edition- Springer Publications. 2) ^ Jump up to:a b c d e f g h i j k l Saffran, Jenny R. (2003). "Statistical language learning: mechanisms and constraints". Current Directions in Psychological Science. 12 (4): 110–114. doi:10.1111/1467-8721.01243. 3) ^ Jump up to:a b Brent, Michael R.; Cartwright, Timothy A. (1996). "Distributional regularity and phonotactic constraints are useful for segmentation". Cognition. 61 (1–2): 93–125. doi:10.1016/S0010- 0277(96)00719-6. 4) ^ Jump up to:a b c d e f g h Saffran, J. R.; Aslin, R. N.; Newport, E. L. (1996). "Statistical Learning by 8-Month-Old Infants". Science. 274 (5294): 1926– 1928. doi:10.1126/science.274.5294.1926. PMID 8943209. 5) Jump up^ Saffran, Jenny R.; Newport, Elissa L.; Aslin, Richard N. (1996). "Word Segmentation: The Role of Distributional Cues". Journal of Memory and Language. 35 (4): 606–621. doi:10.1006/jmla.1996.0032. 6) Jump up^ Aslin, R. N.; Saffran, J. R.; Newport, E. L. (1998). "Computation of Conditional Probability Statistics by 8-Month-Old Infants". Psychological Science. 9 (4): 321–324. doi:10.1111/1467-9280.00063. 7) ^ Jump up to:a b Saffran, Jenny R (2001a). "Words in a sea of sounds: the output of infant statistical learning". Cognition. 81 (2): 149– 169. doi:10.1016/S0010-0277(01)00132-9. 8) ^ Jump up to:a b c Saffran, Jenny R.; Wilson, Diana P. (2003). "From Syllables to Syntax: Multilevel Statistical Learning by 12-Month-Old Infants". Infancy. 4 (2): 273–284. doi:10.1207/S15327078IN0402_07. 9) Jump up^ Mattys, Sven L.; Jusczyk, Peter W.; Luce, Paul A.; Morgan, James L. (1999). "Phonotactic and Prosodic Effects on Word Segmentation in Infants". Cognitive Psychology. 38 (4): 465–494.