Hybridoma Technology ( Production , Purification , and Application )
Assignment 2 linear regression predicting car mpg
1. Assignment 2
Linear Regression
Predicting Car MPG
The goal of this assignment is to help you understand the
concepts of regression through having hands-on
experience with training and applying regression models.
You are given a dataset of car attributes and their gas
consumption in MPG (Mile Per Gallon). Your task is to build
a regression model that can predict a car’s MPG given its
attributes.
Car MPG dataset:
The dataset consists of 393 car models, their attributes and their
MPG. The columns in the data set are as follows:
1. Car Model Name
2. MPG - Miles Per Gallon. This is the value that we want to
predict
3. Number of cylinders
4. Engine Displacement
5. Engine Horse Power
6. Car Weight
7. Acceleration (time needed to reach a speed of 60 miles/hour)
8. Model Year
9. Origin
2. Tasks:
following in python:
1. Load the data from the csv file using Pandas
2. Preview/print the top 10 rows of the data
3. Create the Features matrix (columns 3-9 above – i.e. exclude
the model_name and the mpg
columns)
4. Create the Labels vector (the mpg column)
5. Plot the relationship between each of the features and the
label mpg on a scatter chart. This will
be a total of 7 charts.
6. Normalize the features using the StandardScaler class of the
sklearn.preprocessing package
7. Split the data into training and test data using the
cross_validation class of sklearn
8. Train a regression model on the training subset using the
SGDRegressor class of the
sklearn.linear_models package. Set the number of iterations of
the learner to be 500 iterations.
Perform the training as follows:
a model using the cylinders
feature only, then train a model using the displacement feature
only, and so on.
9. For each of the models trained in step 8, apply the model to
the test subset and then compute
the r2_score, the mean_squared_error, and the
3. mean_absolute_error scores for the predictions
of each model trained above.
10. Train a model using all features for 500 iterations while
setting the regularization type (penalty)
to ‘l1’ instead of the default ‘l2’. Apply the model to the test
data and compute the evaluation
metrics as in step 9.
11. Train a model using all features for 500 iterations with ‘l2’
regularization and an initial learning
rate (eta0) set to 10.0. Compute the evaluation metrics as in
step 9.
What to submit
1. Submit the Jupyter Notebook that shows all your work
exactly as described above. Your notebook should
include section headers and descriptive text that explains what
you are doing at each step (follow the
style of the notebooks we develop at class.)
Submit your jyputer notebook both in *.ipynb format and also
HTML format. To produce the
HTML format: File > Download AS > HTML (.html).
2. Submit a document in PDF format that shows the results of
the experiments you ran in steps 8 to 11
above. The results should be shown in one table similar to the
following:
Features Used Non-default params R2 score Mean Squared
Error Mean Absolute Error
4. Cylinders Iter = 500
Displacement Iter = 500
Horsepower Iter = 500
Weight Iter = 500
Acceleration Iter = 500
Year Iter = 500
Origin Iter = 500
All Features Iter = 500
All Features
Iter = 500, penalty =
l1
All Features Iter = 500, eta0 = 10