SlideShare a Scribd company logo
1 of 17
For any Assignment related queries, call us at : - +1 678 648 4277,
visit : - https://www.statisticsassignmenthelp.com/, or
Email : - support@statisticsassignmenthelp.com
1. Which among the following techniques can be used to aid decision making when
those decisions depend upon some available data?
(a) descriptive statistics
(b) inferential statistics
(c) predictive analytics
(d) prescriptive analytics
Sol. (a), (b), (c), (d)
The techniques listed above allow us to visualise and understand data, infer properties
of data sets and compare data sets, as well as build models of the processes generating
the data in order to make predictions about unseen data. In different scenarios, each of
these processes can be used to help in decision making.
2. In a popular classification algorithm it is assumed that the entire input space can be
divided into axis-parallel rectangles with each such rectangle being assigned a class
label. This assumption is an example of
(a) language bias
(b) search bias
Sol. (a)
The mentioned constraints have to do with what forms the classification models can take
rather than the search process to find the specific model given some data. Hence, the
above assumption is an example of language bias.
3. Suppose we trained a supervised learning algorithm on some training data and observed
that the resultant model gave no error on the training data. Which among the following
conclusions can you draw in this scenario?
(a) the learned model has overfit the data
(b) it is possible that the learned model will generalise well to unseen data
(c) it is possible that the learned model will not generalise well to unseen data
(d) the learned model will definitely perform well on unseen data
(e) the learned model will definitely not perform well on unseen data
Sol. (b), (c)
Consider the (rare) situation where the given data is absolutely pristine, and our choice of
algorithm and parameter selection allow us to come up with a model which exactly
matches the process generating the data. In such a situation we can expect 100% accuracy
on the training data as well as good performance on unseen data.
where we create some synthetic data having a linear relationship between the inputs and
the output and then apply linear regression to model the data.
The more plausible situation is of course that that we used a very complex model which
ended up overfitting the training data. As we have seen, such models may achieve high
accuracy on the training data but generally perform poorly on unseen examples.
4. You are faced with a five class classification problem, with one class being the class
of interest, i.e., you care more about correctly classifying data points belonging to that
one class than the others. You are given a data set to use as training data. You analyse
the data and observe the following properties. Which among these properties do you
think are unfavourable from the point of view of applying general machine learning
techniques?
(a) all data points come from the same distribution
(b) there is class imbalance with much more data points belonging to the class of
interest than the other classes
(c) the given data set has some missing values
(d) each data point in the data set is independent of the other data points
Sol. (c)
Option (a) is favourable, since it is an implicit assumption we make when we try applying
supervised learning techniques. Option (b) indicates class imbalance which is usually an
issue when it comes to classification. However, note that the imbalance is in favour of the
class of interest. This means that there are a large number of examples for the class which
we are interested in, which should allow us to model this class well. Option (c) is of
course unfavourable as it would require us to handle missing data, and given the extent of
the data that is missing, could severely affect the performance of any classifier we come
up with. Finally, option (d) is favourable since, once again, it is an assumption about the
data that we implicitly make.
5. You are given the following four training instances:
• x1 = −1, y1 = 0.0319
• x2 = 0, y2 = 0.8692
• x3 = 1, y3 = 1.9566
• x4 = 2, y = 3.0343
Modelling this data using the ordinary least squares regression form of y = f(x) = b0 +
b1x, which of the following parameters (b0, b1) would you use to best model this data?
(a) (1,1)
(b) (1,2)
(c) (2,1)
Let us consider the model with parameters (1,1). We can find the sum of squared errors
using this parameter setting as
Sol. (a)
(0 − 0.319)2 + (1 − 0.8692)2 + (2 − 1.9566)2 + (3 − 3.0343)2 = 0.12
Repeating the same calculation for the other 3 options, we find that the minimum
squared error is obtained when the parameters are (1,1). Thus, this is the most suitable
parameter setting for modelling the given data.
6. You are given the following five training instances:
• x1 = 1, y1 = 3.4
• x2 = 1.5, y2 = 4.7
• x3 = 2, y3 = 6.15
• x4 = 2.25, y4 = 6.4
• x5 = 4, y5 = 10.9
Using the derivation results for the parameters in ordinary least squares, calculate the
values of b0 and b1. (Note that in the expression for b1, the last term in the denominator
is
(a) b0 = 7.54, b1 = 0.57
(b) b0 = 2.484, b1 = 0.969
(c) b0 = 0.969, b1 = 2.484
(d) b0 = 1, b1 = 2.5
Sol. (c)
According to the derivations we saw in the lectures, we have
and
Using the supplied data, we have:
N = 5
Using the above calculations, we have
Using this value of b1, we have
7. Recall the regression output obtained when using Microsoft Excel. In the third table,
there are columns for t Stat and P-value. Is the probability value shown here for a one-
sided test or a two-sided test?
(a) one-sided test
(b) two-sided test
Sol. (b)
Recall that the null hypothesis used here is H0: bi = 0, where bi is the coefficient
corresponding to the ith variable (i > 0), and b0 is the intercept.
8. In building a linear regression model for a particular data set, you observe the
coefficient of one of the features having a relatively high negative value. This suggests
that
(a) this feature has a strong effect on the model (should be retained)
(b) this feature does not have a strong effect on the model (should be ignored)
(c) it is not possible to comment on the importance of this feature without additional
information
Sol. (c)
A high magnitude suggests that the feature is important. However, it may be the case
that another feature is highly correlated with this feature and it’s coefficient also has a
high magnitude with the opposite sign, in effect cancelling out the effect of the former.
Thus, we cannot really remark on the importance of a feature just because it’s
coefficient has a relatively large magnitude.
9. Assuming that for a specific problem, both linear regression and regression using the K-
NN approach give same levels of performance, which technique would you prefer if
response time, i.e., the time taken to make a prediction given an input data point, is a
major consideration?
(a) linear regression
(b) K-NN
(c) both are equally suitable
Sol. (a)
As we have seen, in K-NN, for each input data point for which we want to make a
prediction, we have to search the entire training data set for the neighbours of that point.
Depending upon the dimensionality of the data as well as the size of the training set, this
process can take some time. On the other hand in linear regression, once the model has
been learned we need to perform a single simple calculation to make a prediction for a
given data point.
10. You are given the following five training instances:
• x1 = 2, x2 = 1, y = 4
• x1 = 6, x2 = 3, y = 2
• x1 = 2, x2 = 5, y = 2
• x1 = 6, x2 = 7, y = 3
• x1 = 10, x2 = 7, y = 3
Using the K-nearest neighbour technique for performing regression, what will be the
predicted y value corresponding to the input point (x1 = 3, x2 = 6), for K = 2 and for K =
3?
(a) K = 2, y = 3; K = 3, y = 2.33
(b) K = 2, y = 3; K = 3, y = 2.66
(c) K = 2, y = 2.5; K = 3, y = 2.33
(d) K = 2, y = 2.5; K = 3, y = 2.66
Sol. (c)
When K = 2, the nearest points are x1 = 2, x2 = 5 and x1 = 6, x2 = 7. Taking the average
of the outputs of these two points, we have y = (2 + 3)/2 = 2.5.
Similarly, when K = 3, we additionally consider the point x1 = 6, x2 = 3 to get output, y
= (2 + 3 + 2)/3 = 2.33.
Weka-based programming assignment questions
The following questions are based on using Weka. Go through the tutorial on Weka
before attempting these questions. You can download the data sets used in this assignment
here.
Data set 1
This is a synthetic data set to get you started with Weka. This data set contains 100 data
points. The input is 3-dimensional (x1, x2, x3) with one output variable (y). This data is
in the arff format which can directly be used in Weka.
Tasks
For this data set, you will need to apply linear regression with regularisation, attribute
selection and collinear attribute elimination disabled (other parameters to be left to their
default values). Use 10-fold cross validation for evaluation.
Data set 2
This is a modified version of the prostate cancer data set from the ESL text book in the
arff format. It contains nine numeric attributes with the attribute lpsa being treated as the
target attribute. This data set is provided in two files. Use the test file provided for
evaluation (rather than the cross-validation method).
Tasks
For this data set, you will need to apply linear regression. First apply linear regression
with regularisation, attribute selection and collinear attribute elimination disabled. Next
enable regularisation and try out different values of the parameter and observe whether
better performance can be obtained with suitable values of the regularisation parameter.
Note that to apply regularisation you should normalise the data (except the target
variable). First normalise the test set and save the normalised data. Next open the training
data, normalise it and then run the regression algorithm (with the normalised test set
supplied for evaluation purposes).
Data set 3
This is the Parkinsons Telemonitoring data set taken from the UCI machine learning
repository. Given are two files, one for training and one for testing. The files are in csv
format and need to be converted into the arff format before algorithms are applied to it.
The last variable, PPE, is the target variable.
Tasks
For this data set, first apply linear regression with regularisation, attribute selection and
collinear attribute elimination disabled. Note the performance and compare with the
performance obtained by applying K-nearest neighbour regression. To run K-NN, select
the function IBk under the lazy folder. Leave all parameters set to their default values,
except for the K parameter (KNN in the interface).
11. What is the best linear fit for data set 1?
(a) y = 5*x1 + 6*x2 + 4*x3 + 12
(b) y = 3*x1 + 7*x2 - 2.5*x3 - 16
(c) y = 3*x1 + 7*x2 + 2.5*x3 - 16
(d) y = 2.5*x1 + 7*x2 + 4*x3 + 16
Sol. (b)
The data set used here is generated using the function specified by option (b) and with no
noise added, hence running linear regression on this data set should allow us to recover
the exact parameters and observe 100% accuracy.
12. Which of the following ridge regression parameter values leads to the lowest root
mean squared error for the prostate cancer data (data set 2) on the supplied test set?
(a) 0
(b) 2
(c) 4
(d) 8
(e) 16
(f) 32
Sol. (e)
Among the given choices of the ridge regression parameter, we observe best
performance (lowest error) at the value of 16.
13. If a curve is plotted with the error on the y-axis and the ridge regression parameter on
the x-axis, then based on your observations in the previous question, which of the
following most closely resembles the curve?
(a) straight line passing through origin
(b) concave downwards function
(c) concave upwards function
(d) line parallel to x-axis
Sol. (c)
In general, as the ridge regression parameter is increased, the error begins to decrease,
but after a certain point, further increase in the parameter drives up the error.
14. Considering data set 3, is the performance of K-nearest neighbour regression (where
the value of the parameter K is varied between 1 and 25) comparable to the performance
of linear regression (without regularisation)?
(a) no
(b) yes
Sol. (b)
At K = 5, the performance is even better.

More Related Content

Similar to MyStataLab Assignment Help

Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fcZachary Combs
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsCHOOSE
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdfMariaKhan905189
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodHarry Potter
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodJames Wong
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodHoang Nguyen
 

Similar to MyStataLab Assignment Help (20)

Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fc
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Data clustering
Data clustering Data clustering
Data clustering
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
 
Guide
GuideGuide
Guide
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 

More from Statistics Assignment Help

Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!
Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!
Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!Statistics Assignment Help
 
The Data of an Observational Study Designed to Compare the Effectiveness of a...
The Data of an Observational Study Designed to Compare the Effectiveness of a...The Data of an Observational Study Designed to Compare the Effectiveness of a...
The Data of an Observational Study Designed to Compare the Effectiveness of a...Statistics Assignment Help
 
Linear Regression Analysis assignment help.ppt
Linear Regression Analysis assignment help.pptLinear Regression Analysis assignment help.ppt
Linear Regression Analysis assignment help.pptStatistics Assignment Help
 
Probabilistic Systems Analysis Assignment Help
Probabilistic Systems Analysis Assignment HelpProbabilistic Systems Analysis Assignment Help
Probabilistic Systems Analysis Assignment HelpStatistics Assignment Help
 

More from Statistics Assignment Help (20)

Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!
Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!
Get Accurate and Reliable Statistics Assignment Help - Boost Your Grades!
 
Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
Pay For Statistics Assignment
Pay For Statistics AssignmentPay For Statistics Assignment
Pay For Statistics Assignment
 
Probability Assignment Help
Probability Assignment HelpProbability Assignment Help
Probability Assignment Help
 
Data Analysis Assignment Help
Data Analysis Assignment HelpData Analysis Assignment Help
Data Analysis Assignment Help
 
R Programming Assignment Help
R Programming Assignment HelpR Programming Assignment Help
R Programming Assignment Help
 
Hypothesis Assignment Help
Hypothesis Assignment HelpHypothesis Assignment Help
Hypothesis Assignment Help
 
The Data of an Observational Study Designed to Compare the Effectiveness of a...
The Data of an Observational Study Designed to Compare the Effectiveness of a...The Data of an Observational Study Designed to Compare the Effectiveness of a...
The Data of an Observational Study Designed to Compare the Effectiveness of a...
 
T- Test and ANOVA using SPSS Assignment Help
T- Test and ANOVA using SPSS Assignment HelpT- Test and ANOVA using SPSS Assignment Help
T- Test and ANOVA using SPSS Assignment Help
 
Linear Regression Analysis assignment help.ppt
Linear Regression Analysis assignment help.pptLinear Regression Analysis assignment help.ppt
Linear Regression Analysis assignment help.ppt
 
Stata Assignment Help
Stata Assignment HelpStata Assignment Help
Stata Assignment Help
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
 
Mathematical Statistics Assignment Help
Mathematical Statistics Assignment HelpMathematical Statistics Assignment Help
Mathematical Statistics Assignment Help
 
Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
Statistics Coursework Assignment Help
Statistics Coursework Assignment HelpStatistics Coursework Assignment Help
Statistics Coursework Assignment Help
 
Advanced Statistics Assignment help
Advanced Statistics Assignment helpAdvanced Statistics Assignment help
Advanced Statistics Assignment help
 
Statistics Coursework Help
Statistics Coursework HelpStatistics Coursework Help
Statistics Coursework Help
 
Probabilistic systems assignment help
Probabilistic systems assignment helpProbabilistic systems assignment help
Probabilistic systems assignment help
 
Probabilistic Systems Analysis Assignment Help
Probabilistic Systems Analysis Assignment HelpProbabilistic Systems Analysis Assignment Help
Probabilistic Systems Analysis Assignment Help
 
Stochastic Process Assignment Help
Stochastic Process Assignment HelpStochastic Process Assignment Help
Stochastic Process Assignment Help
 

Recently uploaded

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

MyStataLab Assignment Help

  • 1. For any Assignment related queries, call us at : - +1 678 648 4277, visit : - https://www.statisticsassignmenthelp.com/, or Email : - support@statisticsassignmenthelp.com
  • 2. 1. Which among the following techniques can be used to aid decision making when those decisions depend upon some available data? (a) descriptive statistics (b) inferential statistics (c) predictive analytics (d) prescriptive analytics Sol. (a), (b), (c), (d) The techniques listed above allow us to visualise and understand data, infer properties of data sets and compare data sets, as well as build models of the processes generating the data in order to make predictions about unseen data. In different scenarios, each of these processes can be used to help in decision making. 2. In a popular classification algorithm it is assumed that the entire input space can be divided into axis-parallel rectangles with each such rectangle being assigned a class label. This assumption is an example of (a) language bias (b) search bias
  • 3. Sol. (a) The mentioned constraints have to do with what forms the classification models can take rather than the search process to find the specific model given some data. Hence, the above assumption is an example of language bias. 3. Suppose we trained a supervised learning algorithm on some training data and observed that the resultant model gave no error on the training data. Which among the following conclusions can you draw in this scenario? (a) the learned model has overfit the data (b) it is possible that the learned model will generalise well to unseen data (c) it is possible that the learned model will not generalise well to unseen data (d) the learned model will definitely perform well on unseen data (e) the learned model will definitely not perform well on unseen data Sol. (b), (c) Consider the (rare) situation where the given data is absolutely pristine, and our choice of algorithm and parameter selection allow us to come up with a model which exactly matches the process generating the data. In such a situation we can expect 100% accuracy on the training data as well as good performance on unseen data.
  • 4. where we create some synthetic data having a linear relationship between the inputs and the output and then apply linear regression to model the data. The more plausible situation is of course that that we used a very complex model which ended up overfitting the training data. As we have seen, such models may achieve high accuracy on the training data but generally perform poorly on unseen examples. 4. You are faced with a five class classification problem, with one class being the class of interest, i.e., you care more about correctly classifying data points belonging to that one class than the others. You are given a data set to use as training data. You analyse the data and observe the following properties. Which among these properties do you think are unfavourable from the point of view of applying general machine learning techniques? (a) all data points come from the same distribution (b) there is class imbalance with much more data points belonging to the class of interest than the other classes (c) the given data set has some missing values (d) each data point in the data set is independent of the other data points
  • 5. Sol. (c) Option (a) is favourable, since it is an implicit assumption we make when we try applying supervised learning techniques. Option (b) indicates class imbalance which is usually an issue when it comes to classification. However, note that the imbalance is in favour of the class of interest. This means that there are a large number of examples for the class which we are interested in, which should allow us to model this class well. Option (c) is of course unfavourable as it would require us to handle missing data, and given the extent of the data that is missing, could severely affect the performance of any classifier we come up with. Finally, option (d) is favourable since, once again, it is an assumption about the data that we implicitly make. 5. You are given the following four training instances: • x1 = −1, y1 = 0.0319 • x2 = 0, y2 = 0.8692 • x3 = 1, y3 = 1.9566 • x4 = 2, y = 3.0343 Modelling this data using the ordinary least squares regression form of y = f(x) = b0 + b1x, which of the following parameters (b0, b1) would you use to best model this data?
  • 6. (a) (1,1) (b) (1,2) (c) (2,1) Let us consider the model with parameters (1,1). We can find the sum of squared errors using this parameter setting as Sol. (a) (0 − 0.319)2 + (1 − 0.8692)2 + (2 − 1.9566)2 + (3 − 3.0343)2 = 0.12 Repeating the same calculation for the other 3 options, we find that the minimum squared error is obtained when the parameters are (1,1). Thus, this is the most suitable parameter setting for modelling the given data. 6. You are given the following five training instances: • x1 = 1, y1 = 3.4 • x2 = 1.5, y2 = 4.7 • x3 = 2, y3 = 6.15 • x4 = 2.25, y4 = 6.4
  • 7. • x5 = 4, y5 = 10.9 Using the derivation results for the parameters in ordinary least squares, calculate the values of b0 and b1. (Note that in the expression for b1, the last term in the denominator is (a) b0 = 7.54, b1 = 0.57 (b) b0 = 2.484, b1 = 0.969 (c) b0 = 0.969, b1 = 2.484 (d) b0 = 1, b1 = 2.5 Sol. (c) According to the derivations we saw in the lectures, we have and
  • 8. Using the supplied data, we have: N = 5 Using the above calculations, we have
  • 9. Using this value of b1, we have 7. Recall the regression output obtained when using Microsoft Excel. In the third table, there are columns for t Stat and P-value. Is the probability value shown here for a one- sided test or a two-sided test? (a) one-sided test (b) two-sided test Sol. (b)
  • 10. Recall that the null hypothesis used here is H0: bi = 0, where bi is the coefficient corresponding to the ith variable (i > 0), and b0 is the intercept. 8. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively high negative value. This suggests that (a) this feature has a strong effect on the model (should be retained) (b) this feature does not have a strong effect on the model (should be ignored) (c) it is not possible to comment on the importance of this feature without additional information Sol. (c) A high magnitude suggests that the feature is important. However, it may be the case that another feature is highly correlated with this feature and it’s coefficient also has a high magnitude with the opposite sign, in effect cancelling out the effect of the former. Thus, we cannot really remark on the importance of a feature just because it’s coefficient has a relatively large magnitude.
  • 11. 9. Assuming that for a specific problem, both linear regression and regression using the K- NN approach give same levels of performance, which technique would you prefer if response time, i.e., the time taken to make a prediction given an input data point, is a major consideration? (a) linear regression (b) K-NN (c) both are equally suitable Sol. (a) As we have seen, in K-NN, for each input data point for which we want to make a prediction, we have to search the entire training data set for the neighbours of that point. Depending upon the dimensionality of the data as well as the size of the training set, this process can take some time. On the other hand in linear regression, once the model has been learned we need to perform a single simple calculation to make a prediction for a given data point. 10. You are given the following five training instances:
  • 12. • x1 = 2, x2 = 1, y = 4 • x1 = 6, x2 = 3, y = 2 • x1 = 2, x2 = 5, y = 2 • x1 = 6, x2 = 7, y = 3 • x1 = 10, x2 = 7, y = 3 Using the K-nearest neighbour technique for performing regression, what will be the predicted y value corresponding to the input point (x1 = 3, x2 = 6), for K = 2 and for K = 3? (a) K = 2, y = 3; K = 3, y = 2.33 (b) K = 2, y = 3; K = 3, y = 2.66 (c) K = 2, y = 2.5; K = 3, y = 2.33 (d) K = 2, y = 2.5; K = 3, y = 2.66 Sol. (c) When K = 2, the nearest points are x1 = 2, x2 = 5 and x1 = 6, x2 = 7. Taking the average of the outputs of these two points, we have y = (2 + 3)/2 = 2.5.
  • 13. Similarly, when K = 3, we additionally consider the point x1 = 6, x2 = 3 to get output, y = (2 + 3 + 2)/3 = 2.33. Weka-based programming assignment questions The following questions are based on using Weka. Go through the tutorial on Weka before attempting these questions. You can download the data sets used in this assignment here. Data set 1 This is a synthetic data set to get you started with Weka. This data set contains 100 data points. The input is 3-dimensional (x1, x2, x3) with one output variable (y). This data is in the arff format which can directly be used in Weka. Tasks For this data set, you will need to apply linear regression with regularisation, attribute selection and collinear attribute elimination disabled (other parameters to be left to their default values). Use 10-fold cross validation for evaluation.
  • 14. Data set 2 This is a modified version of the prostate cancer data set from the ESL text book in the arff format. It contains nine numeric attributes with the attribute lpsa being treated as the target attribute. This data set is provided in two files. Use the test file provided for evaluation (rather than the cross-validation method). Tasks For this data set, you will need to apply linear regression. First apply linear regression with regularisation, attribute selection and collinear attribute elimination disabled. Next enable regularisation and try out different values of the parameter and observe whether better performance can be obtained with suitable values of the regularisation parameter. Note that to apply regularisation you should normalise the data (except the target variable). First normalise the test set and save the normalised data. Next open the training data, normalise it and then run the regression algorithm (with the normalised test set supplied for evaluation purposes). Data set 3
  • 15. This is the Parkinsons Telemonitoring data set taken from the UCI machine learning repository. Given are two files, one for training and one for testing. The files are in csv format and need to be converted into the arff format before algorithms are applied to it. The last variable, PPE, is the target variable. Tasks For this data set, first apply linear regression with regularisation, attribute selection and collinear attribute elimination disabled. Note the performance and compare with the performance obtained by applying K-nearest neighbour regression. To run K-NN, select the function IBk under the lazy folder. Leave all parameters set to their default values, except for the K parameter (KNN in the interface). 11. What is the best linear fit for data set 1? (a) y = 5*x1 + 6*x2 + 4*x3 + 12 (b) y = 3*x1 + 7*x2 - 2.5*x3 - 16 (c) y = 3*x1 + 7*x2 + 2.5*x3 - 16 (d) y = 2.5*x1 + 7*x2 + 4*x3 + 16
  • 16. Sol. (b) The data set used here is generated using the function specified by option (b) and with no noise added, hence running linear regression on this data set should allow us to recover the exact parameters and observe 100% accuracy. 12. Which of the following ridge regression parameter values leads to the lowest root mean squared error for the prostate cancer data (data set 2) on the supplied test set? (a) 0 (b) 2 (c) 4 (d) 8 (e) 16 (f) 32 Sol. (e) Among the given choices of the ridge regression parameter, we observe best performance (lowest error) at the value of 16.
  • 17. 13. If a curve is plotted with the error on the y-axis and the ridge regression parameter on the x-axis, then based on your observations in the previous question, which of the following most closely resembles the curve? (a) straight line passing through origin (b) concave downwards function (c) concave upwards function (d) line parallel to x-axis Sol. (c) In general, as the ridge regression parameter is increased, the error begins to decrease, but after a certain point, further increase in the parameter drives up the error. 14. Considering data set 3, is the performance of K-nearest neighbour regression (where the value of the parameter K is varied between 1 and 25) comparable to the performance of linear regression (without regularisation)? (a) no (b) yes Sol. (b) At K = 5, the performance is even better.