SlideShare a Scribd company logo
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 1/11
INSTRUCTIONS:
* Add your code as indicated in each cell.
* Besides adding your code, do not alter this file.
* Do not delete or change test cases. Once you are done with a
question, you can run the test cases to see if you programmed
the
question correctly.
* If you get a question wrong, do not give up. Keep trying until
you
pass the test cases.
* Rename the file as firstname_lastname_assignmentid.ipynb
(e.g.,
marina_johnson_assignment1.ipynb)
* Only submit .ipynb files (no .py files)
#
Question 1
1. Read the employee_attrition dataset and save it as df. Recall
that the target variable in this
dataset is named 'Attrition.'
1. Check if the dataset is imbalanced by counting the number of
Noes and Yeses in the target
variable Attrition.
Hints:
Imbalanced data refers to a situation where the number of
observations is not the
same for all the classes in a dataset. For example, the number of
churned
employees is 4000, while the number of unchurned employees is
40000. This
means this dataset is imbalanced.
You need to access the target variable Attrition and count how
many Yes and No
there is in this variable. If the number of Yes's is equal to the
number of No's, then
the dataset is balanced. Otherwise, it is not balanced.
In [138… # Do not delete this cell
import numpy as np
score = dict()
np.random.seed(333)
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 2/11
Check Module 5g: Encoding Categorical Variables to earn more
about data
imbalance problems. Particularly, check 2.5: Balancing datasets
in Module 5.
Do not alter the below cell. It is a test case for
Question 1
{'question 1': 'pass'}
#
Question 2
1. Identify the names of the numerical input variables and save
it as a LIST
1. Identify the names of the categorical input variables+ and
save it as a LIST
Hints:
Remember Attrition is the target (output) variable, so exclude
Attrition from
both LISTS containing the numerical and categorical input
variables.
Check Modules 5b: Dropping Variables and Module 3e: Helpful
Functions
(check after minute 4)
Do not alter the below cell. It is a test case for
Question 2
In [139… import pandas as pd
df = # your code to read the dataset goes in here
number_of_yes = # your code to find the number
# of yeses in the Attrition variable goes in
here
number_of_no = # your code to find the number
# of noes in the Attrition variable goes in
here
In [140… try:
if (number_of_yes == 237 and number_of_no == 1233):
score['question 1'] = 'pass'
else:
score['question 1'] = 'fail'
except:
score['question 1'] = 'fail'
score
Out[140]:
In [141… numerical_variables = # Your code to identify
numerical variables goes in here
categorical_varables = # Your code to identify categorical
variables goes in here
In [142… try:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 3/11
{'question 1': 'pass', 'question 2': 'pass'}
#
Question 3
1. Identify the numerical variables with zero variance (i.e., zero
standard deviation) and save
them in a LIST
1. Drop these numerical variables with zero variance (i.e., zero
standard deviation) from the
dataset df. The dataset df should not have these variables going
forward.
Hints:
For each numerical variable, compute the standard deviation. If
the standard
deviation is zero, delete (i.e., drop) that variable from the
dataset df.
Check Modules 5b: Dropping Variables
Do not alter the below cell. It is a test case for
Question 3
if ((sorted(numerical_variables) ==
['Age','DailyRate','DistanceFromHome','Education',
'EmployeeCount','EmployeeNumber','EnvironmentSatisfaction',
'HourlyRate','JobInvolvement','JobLevel','JobSatisfaction',
'MonthlyIncome','MonthlyRate','NumCompaniesWorked','Percen
tSalaryHike',
'PerformanceRating','RelationshipSatisfaction','StandardHours',
'StockOptionLevel','TotalWorkingYears','TrainingTimesLastYea
r',
'WorkLifeBalance','YearsAtCompany','YearsInCurrentRole',
'YearsSinceLastPromotion','YearsWithCurrManager']) and
(sorted(categorical_varables) ==
['BusinessTravel','Department','EducationField','Gender',
'JobRole','MaritalStatus','Over18','OverTime'])):
score['question 2'] = 'pass'
else:
score['question 2'] = 'fail'
except:
score['question 2'] = 'fail'
score
Out[142]:
In [143… zero_variance_numerical_variables = # your code to
find the
# numerical variables with zero variance
goes in here
df = # your code to drop the zero variance numerical variables
goes in here
In [144… try:
if (zero_variance_numerical_variables == ['EmployeeCount',
'StandardHours']):
score['question 3'] = 'pass'
else:
score['question 3'] = 'fail'
except:
score['question 3'] = 'fail'
score
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 4/11
{'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass'}
#
Question 4
1. Identify the categorical variables with zero variance (i.e., low
cardinality) and save them in a
LIST
1. Drop these categorical variables with zero variance (i.e., low
cardinality) from the dataset df.
The dataset df should not have these variables going forward.
Hints:
For each categorical variable, find the number of levels. If the
number of levels is
1, delete (i.e., drop) that variable from the dataset df. For
example, if a variable
named occupation has only "Engineers" across all the rows (i.e.,
one level), the
variable does not contain any information. In other words, zero
variation.
Check Modules 5b: Dropping Variables
Do not alter the below cell. It is a test case for
Question 4
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass'}
#
Question 5
Out[144]:
In [145… zero_variance_categorical_variables = [] # your code
to find the
# categorical variables with zero variance
goes in here
df = # your code to drop the zero variance
# categorical variables goes in here
In [146… try:
if (zero_variance_categorical_variables == ['Over18']):
score['question 4'] = 'pass'
else:
score['question 4'] = 'fail'
except:
score['question 4'] = 'fail'
score
Out[146]:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 5/11
1. Find the categorical variables with very high variance (i.e.,
very high cardinality) and save
them in a LIST. Use 200 as the threshold. In other words, the
categorical variables over 200
levels should be considered as variables with high cardinality
(i.e., with high variance).
1. Drop the categorical variables with very high variance (i.e.,
very high cardinality) from the
dataset df. The dataset df should not have these variables going
forward.
Hints:
For each categorical variable, find the number of levels. If the
number of levels is
greater than 200, delete (i.e., drop) that variable from the
dataset df. For example,
Check Modules 5b: Dropping Variables
Do not alter the below cell. It is a test case for
Question 5
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass'}
#
Question 6
1. Scale (i.e., standardize) the numerical variables in the dataset
using the standardization
method and drop the original numerical variables and only keep
the standardized ones.
2. The new standardized numerical variables should have the
same variable names. For
example, the age variable after being standardized should be
named the same (i.e., age)
Hints:
Feature standardization makes the values of each feature in the
data have zero-
mean (when subtracting the mean in the numerator) and unit-
variance. This
In [147… high_cardinality_categorical_variables = [] # your
code to find the
# categorical variables with high variance
(i.e., cardinality) goes in here
df = # your code to drop the high cardinality
# categorical variables goes in here
In [148… try:
if (high_cardinality_categorical_variables == []):
score['question 5'] = 'pass'
else:
score['question 5'] = 'fail'
except:
score['question 5'] = 'fail'
score
Out[148]:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 6/11
method is widely used for normalization in many machine
learning algorithms.
Check M5d: Standardization
Do not alter the below cell. It is a test case for
Question 6
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass'}
#
Question 7
1. Encode the categorical input variables. Do not encode the
target variable Attrition. You will
do that in the following question.
Hints:
You will create dummies for categorical variables.
Example: Let's say you have a variable named occupation. This
variable has three
levels: Engineer, Teacher, Manager. We will use binary
encoding and create
dummies for each of these levels to be able to encode the
occupation variable.
Technically, we are converting the categorical variable into new
numerical
variables.
We will have two new variables for this occupation variable,
such as
occupation_teacher, occupation_manager. We do not need
occupation_teacher
because we can infer if the person is a teacher by checking
occupation_manager
and occupation_engineer variables.
For example: If occupation_enginner and occupation_manager
are zero, then this
person is a teacher.
If occupation_engineer is 1, this person is an engineer.
Check Module 5g: Encoding Categorical Variables
In [149… # your code to standardize numerical variables goes
in here
df =
In [150… try:
if ((df['Age'].max() == 2.526885578888087) and
(df['DailyRate'].max() == 1.7267301192801021)):
score['question 6'] = 'pass'
else:
score['question 6'] = 'fail'
except:
score['question 6'] = 'fail'
score
Out[150]:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 7/11
Do not alter the below cell. It is a test case for
Question 7
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass'}
#
Question 8
1. Encode the categorical output variable: Attrition. Yes should
be coded as 1, and No should
be coded as 0. The new encoded target variable should be
named as Attrition. Do not
forget to drop the categorical Attirion Variable. Basically, you
will convert the categorical
Attrition variable into numerical attrition variable such that Yes
will be mapped to 1, and No
will be mapped to zero.
Hints:
Check Module 3 and Module 5 videos.
Do not alter the below cell. It is a test case for
Question 8
In [151… # your code to encode categorical input variables goes
in here
df =
In [152… try:
if ((df['JobRole_Laboratory Technician'].mean() ==
0.1761904761904762) and
(df['EducationField_Marketing'].mean() ==
0.10816326530612246)):
score['question 7'] = 'pass'
else:
score['question 7'] = 'fail'
except:
score['question 7'] = 'fail'
score
Out[152]:
In [153… # your code to encode categorical output variables
Attrition goes in here
df =
In [154… try:
if (df['Attrition'].mean() == 0.16122448979591836):
score['question 8'] = 'pass'
else:
score['question 8'] = 'fail'
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 8/11
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass',
'question 8': 'pass'}
#
Question 9
1. Balance the dataset
1. Your code should return the input and output variables
seperately. The input variables will
be saved as a dataframe named X. The output variable will be
saved as a dataframe named
y.
Hints:
Imbalanced data refers to a situation where the number of
observations is not the
same for all the classes in a dataset. For example, the number of
churned
employees is 4000, while the number of unchurned employees is
40000. This
means this dataset is imbalanced.
You need to access the target variable Attrition and increase the
number of ones
(i.e., Yeses) so that both the number of zeros (i.e., Noes) and
the number of ones
(i.e., Yeses) will be equal.
Check M5g: Encoding Categorical Variables. balancing dataset
is discussed in
this video.
Do not alter the below cell. It is a test case for
Question 9
except:
score['question 8'] = 'fail'
score
Out[154]:
In [156… # Your code to balance the dataset goes in here
X = # dataframe containing the input variables after balancing
y = # dataframe containing the output variable Attrition after
balancing
In [157… try:
if ((y.Attrition.value_counts()[0] == 1233) and
(y.Attrition.value_counts()[1] == 1233)):
score['question 9'] = 'pass'
else:
score['question 9'] = 'fail'
except:
score['question 9'] = 'fail'
score
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 9/11
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass',
'question 8': 'pass',
'question 9': 'pass'}
#
Question 10
Split the dataset into training and testing Basically using X and
y dataframes, you will
create X_train, X_test, y_train, and y_test.
You need to keep 70% of the dataset for training and 30% for
testing.
Hints:
You can use the train_test_split function in sklearn library
Check Module M6c: Classification
Do not alter the below cell. It is a test case for
Question 6
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass',
'question 8': 'pass',
'question 9': 'pass',
'question 10': 'pass'}
#
Out[157]:
In [158… # your code to create train and test sets goes in here
X_train, X_test, y_train, y_test = # your code to create train and
test sets goes in here
In [159… try:
if ((X_train.shape[0]<1750) and (X_train.shape[0]>1700)):
score['question 10'] = 'pass'
else:
score['question 10'] = 'fail'
except:
score['question 10'] = 'fail'
score
Out[159]:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 10/11
Question 11
1. Train a knn model where k is 3 using the training dataset.
1. Make predictions using the test dataset
1. Compute accuracy and save as accuracy
Hints:
You need to use the KNeighborsClassifier function. Instantiate
a knn object and
pass the number of neighbors to the function. Train the model
using the X_train
and y_train. Then make predictions using X_test. Then compute
the accuracy using
the predicted values and y_test.
Check Module 6d: Model Performance and _Module 5c:
Classification
Do not alter the below cell. It is a test case for
Question 11
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass',
'question 8': 'pass',
'question 9': 'pass',
'question 10': 'pass',
'question 11': 'pass'}
#
Question 12
1. Train a Random Forests model where the number of
estimators is 100 using the training
dataset.
In [160… # Your code to train knn, make predictions, and
compute accuracy goes in here
accuracy = # compute accuracy here
In [161… try:
if (accuracy > 0.70):
score['question 11'] = 'pass'
else:
score['question 11'] = 'fail'
except:
score['question 11'] = 'fail'
score
Out[161]:
12/4/22, 7:44 PM Assignment_3
localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy
nb?download=false 11/11
1. Make predictions using the test dataset
1. Compute accuracy and save as accuracy
Hints:
You need to use the RandomForestClassifier function.
Instantiate a
RandomForestClassifier object and pass the number of
estimators to the function.
Train the model using the X_train and y_train. Then make
predictions using X_test.
Then compute the accuracy using the predicted values and
y_test.
Check Module 6d: Model Performance and _Module 5c:
Classification
Do not alter the below cell. It is a test case for
Question 6
{'question 1': 'pass',
'question 2': 'pass',
'question 3': 'pass',
'question 4': 'pass',
'question 5': 'pass',
'question 6': 'pass',
'question 7': 'pass',
'question 8': 'pass',
'question 9': 'pass',
'question 10': 'pass',
'question 11': 'pass',
'question 12': 'pass'}
#
Your Grade
Your overall score is: 100
In [162… # Your code to train random forest, make predictions,
and compute accuracy goes in here
accuracy = # compute accuracy here
In [163… try:
if (accuracy > 0.80):
score['question 12'] = 'pass'
else:
score['question 12'] = 'fail'
except:
score['question 12'] = 'fail'
score
Out[163]:
In [164… print('Your overall score is: ',
round(list(score.values()).count('pass')*8.3333))
12422, 744 PM Assignment_3localhost8888nbconverthtml.docx

More Related Content

Similar to 12422, 744 PM Assignment_3localhost8888nbconverthtml.docx

How To Test Everything
How To Test EverythingHow To Test Everything
How To Test Everything
noelrap
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
Raman Kannan
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docx
amrit47
 
Test driven development_for_php
Test driven development_for_phpTest driven development_for_php
Test driven development_for_php
Lean Teams Consultancy
 
Lecture1.pdf
Lecture1.pdfLecture1.pdf
Lecture1.pdf
SakhilejasonMsibi
 
Lecture - 3 Variables-data type_operators_oops concept
Lecture - 3 Variables-data type_operators_oops conceptLecture - 3 Variables-data type_operators_oops concept
Lecture - 3 Variables-data type_operators_oops concept
manish kumar
 
Test in action week 4
Test in action   week 4Test in action   week 4
Test in action week 4
Yi-Huan Chan
 
Php tests tips
Php tests tipsPhp tests tips
Php tests tips
Damian Sromek
 
Classification examp
Classification exampClassification examp
Classification examp
Ryan Hong
 
cs348-06-lab3.doc
cs348-06-lab3.doccs348-06-lab3.doc
cs348-06-lab3.doc
butest
 
cs348-06-lab3.doc
cs348-06-lab3.doccs348-06-lab3.doc
cs348-06-lab3.doc
butest
 
Astronomical data analysis by python.pdf
Astronomical data analysis by python.pdfAstronomical data analysis by python.pdf
Astronomical data analysis by python.pdf
ZainRahim3
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
Wake Tech BAS
 
Savitch Ch 07
Savitch Ch 07Savitch Ch 07
Savitch Ch 07
Terry Yoast
 
Savitch Ch 07
Savitch Ch 07Savitch Ch 07
Savitch Ch 07
Terry Yoast
 
DSA 103 Object Oriented Programming :: Week 5
DSA 103 Object Oriented Programming :: Week 5DSA 103 Object Oriented Programming :: Week 5
DSA 103 Object Oriented Programming :: Week 5
Ferdin Joe John Joseph PhD
 
Below is my code for C++- I keep getting an error 43 5 C--Progr.pdf
Below is my code for C++- I keep getting an error  43    5    C--Progr.pdfBelow is my code for C++- I keep getting an error  43    5    C--Progr.pdf
Below is my code for C++- I keep getting an error 43 5 C--Progr.pdf
anilbhagat17
 
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
JanuMorandy
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
mcollison
 
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
Live Exam Helper
 

Similar to 12422, 744 PM Assignment_3localhost8888nbconverthtml.docx (20)

How To Test Everything
How To Test EverythingHow To Test Everything
How To Test Everything
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docx
 
Test driven development_for_php
Test driven development_for_phpTest driven development_for_php
Test driven development_for_php
 
Lecture1.pdf
Lecture1.pdfLecture1.pdf
Lecture1.pdf
 
Lecture - 3 Variables-data type_operators_oops concept
Lecture - 3 Variables-data type_operators_oops conceptLecture - 3 Variables-data type_operators_oops concept
Lecture - 3 Variables-data type_operators_oops concept
 
Test in action week 4
Test in action   week 4Test in action   week 4
Test in action week 4
 
Php tests tips
Php tests tipsPhp tests tips
Php tests tips
 
Classification examp
Classification exampClassification examp
Classification examp
 
cs348-06-lab3.doc
cs348-06-lab3.doccs348-06-lab3.doc
cs348-06-lab3.doc
 
cs348-06-lab3.doc
cs348-06-lab3.doccs348-06-lab3.doc
cs348-06-lab3.doc
 
Astronomical data analysis by python.pdf
Astronomical data analysis by python.pdfAstronomical data analysis by python.pdf
Astronomical data analysis by python.pdf
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
 
Savitch Ch 07
Savitch Ch 07Savitch Ch 07
Savitch Ch 07
 
Savitch Ch 07
Savitch Ch 07Savitch Ch 07
Savitch Ch 07
 
DSA 103 Object Oriented Programming :: Week 5
DSA 103 Object Oriented Programming :: Week 5DSA 103 Object Oriented Programming :: Week 5
DSA 103 Object Oriented Programming :: Week 5
 
Below is my code for C++- I keep getting an error 43 5 C--Progr.pdf
Below is my code for C++- I keep getting an error  43    5    C--Progr.pdfBelow is my code for C++- I keep getting an error  43    5    C--Progr.pdf
Below is my code for C++- I keep getting an error 43 5 C--Progr.pdf
 
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
CMIS 102 HANDS-ON LAB WEEK 6 OVERVIEW THIS HANDS-ON LAB ALLOWS YOU TO FOLLOW ...
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
 
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
Python Exam (Questions with Solutions Done By Live Exam Helper Experts)
 

More from robert345678

1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
robert345678
 
1IntroductionThe objective of this study plan is to evaluate.docx
1IntroductionThe objective of this study plan is to evaluate.docx1IntroductionThe objective of this study plan is to evaluate.docx
1IntroductionThe objective of this study plan is to evaluate.docx
robert345678
 
1Project One Executive SummaryCole Staats.docx
1Project One Executive SummaryCole Staats.docx1Project One Executive SummaryCole Staats.docx
1Project One Executive SummaryCole Staats.docx
robert345678
 
1Management Of CareChamberlain U.docx
1Management Of CareChamberlain U.docx1Management Of CareChamberlain U.docx
1Management Of CareChamberlain U.docx
robert345678
 
1NOTE This is a template to help you format Project Part .docx
1NOTE This is a template to help you format Project Part .docx1NOTE This is a template to help you format Project Part .docx
1NOTE This is a template to help you format Project Part .docx
robert345678
 
15Problem Orientation and Psychologica.docx
15Problem Orientation and Psychologica.docx15Problem Orientation and Psychologica.docx
15Problem Orientation and Psychologica.docx
robert345678
 
122422, 850 AMHow to successfully achieve business integrat.docx
122422, 850 AMHow to successfully achieve business integrat.docx122422, 850 AMHow to successfully achieve business integrat.docx
122422, 850 AMHow to successfully achieve business integrat.docx
robert345678
 
1PAGE 5West Chester Private School Case StudyGrand .docx
1PAGE  5West Chester Private School Case StudyGrand .docx1PAGE  5West Chester Private School Case StudyGrand .docx
1PAGE 5West Chester Private School Case StudyGrand .docx
robert345678
 
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
robert345678
 
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
robert345678
 
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
robert345678
 
1.2.3.4.5.6.7.8..docx
1.2.3.4.5.6.7.8..docx1.2.3.4.5.6.7.8..docx
1.2.3.4.5.6.7.8..docx
robert345678
 
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
robert345678
 
1. When drug prices increase at a faster rate than inflation, the .docx
1. When drug prices increase at a faster rate than inflation, the .docx1. When drug prices increase at a faster rate than inflation, the .docx
1. When drug prices increase at a faster rate than inflation, the .docx
robert345678
 
1. Which of the following sentences describe a child functioning a.docx
1. Which of the following sentences describe a child functioning a.docx1. Which of the following sentences describe a child functioning a.docx
1. Which of the following sentences describe a child functioning a.docx
robert345678
 
1. How did the case study impact your thoughts about your own fina.docx
1. How did the case study impact your thoughts about your own fina.docx1. How did the case study impact your thoughts about your own fina.docx
1. How did the case study impact your thoughts about your own fina.docx
robert345678
 
1 The Biography of Langston Hughes .docx
1  The Biography of Langston Hughes .docx1  The Biography of Langston Hughes .docx
1 The Biography of Langston Hughes .docx
robert345678
 
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
robert345678
 
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
robert345678
 
1 Assessment Brief Module Code Module .docx
1     Assessment Brief  Module Code  Module .docx1     Assessment Brief  Module Code  Module .docx
1 Assessment Brief Module Code Module .docx
robert345678
 

More from robert345678 (20)

1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
1Principles of Economics, Ninth EditionN. Gregory Mankiw.docx
 
1IntroductionThe objective of this study plan is to evaluate.docx
1IntroductionThe objective of this study plan is to evaluate.docx1IntroductionThe objective of this study plan is to evaluate.docx
1IntroductionThe objective of this study plan is to evaluate.docx
 
1Project One Executive SummaryCole Staats.docx
1Project One Executive SummaryCole Staats.docx1Project One Executive SummaryCole Staats.docx
1Project One Executive SummaryCole Staats.docx
 
1Management Of CareChamberlain U.docx
1Management Of CareChamberlain U.docx1Management Of CareChamberlain U.docx
1Management Of CareChamberlain U.docx
 
1NOTE This is a template to help you format Project Part .docx
1NOTE This is a template to help you format Project Part .docx1NOTE This is a template to help you format Project Part .docx
1NOTE This is a template to help you format Project Part .docx
 
15Problem Orientation and Psychologica.docx
15Problem Orientation and Psychologica.docx15Problem Orientation and Psychologica.docx
15Problem Orientation and Psychologica.docx
 
122422, 850 AMHow to successfully achieve business integrat.docx
122422, 850 AMHow to successfully achieve business integrat.docx122422, 850 AMHow to successfully achieve business integrat.docx
122422, 850 AMHow to successfully achieve business integrat.docx
 
1PAGE 5West Chester Private School Case StudyGrand .docx
1PAGE  5West Chester Private School Case StudyGrand .docx1PAGE  5West Chester Private School Case StudyGrand .docx
1PAGE 5West Chester Private School Case StudyGrand .docx
 
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
12Toxoplasmosis and Effects on Abortion, And Fetal A.docx
 
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
155Chapter 11The Frivolity of EvilTheodore Dalrymple.docx
 
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
122022, 824 PM Rubric Assessment - SOC1001-Introduction to .docx
 
1.2.3.4.5.6.7.8..docx
1.2.3.4.5.6.7.8..docx1.2.3.4.5.6.7.8..docx
1.2.3.4.5.6.7.8..docx
 
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
121122, 1204 AM Activities - IDS-403-H7189 Technology and S.docx
 
1. When drug prices increase at a faster rate than inflation, the .docx
1. When drug prices increase at a faster rate than inflation, the .docx1. When drug prices increase at a faster rate than inflation, the .docx
1. When drug prices increase at a faster rate than inflation, the .docx
 
1. Which of the following sentences describe a child functioning a.docx
1. Which of the following sentences describe a child functioning a.docx1. Which of the following sentences describe a child functioning a.docx
1. Which of the following sentences describe a child functioning a.docx
 
1. How did the case study impact your thoughts about your own fina.docx
1. How did the case study impact your thoughts about your own fina.docx1. How did the case study impact your thoughts about your own fina.docx
1. How did the case study impact your thoughts about your own fina.docx
 
1 The Biography of Langston Hughes .docx
1  The Biography of Langston Hughes .docx1  The Biography of Langston Hughes .docx
1 The Biography of Langston Hughes .docx
 
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
1 Save Our Doughmocracy A Moophoric Voter Registratio.docx
 
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
1 MINISTRY OF EDUCATION UNIVERSITY OF HAIL .docx
 
1 Assessment Brief Module Code Module .docx
1     Assessment Brief  Module Code  Module .docx1     Assessment Brief  Module Code  Module .docx
1 Assessment Brief Module Code Module .docx
 

Recently uploaded

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 

Recently uploaded (20)

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 

12422, 744 PM Assignment_3localhost8888nbconverthtml.docx

  • 1. 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 1/11 INSTRUCTIONS: * Add your code as indicated in each cell. * Besides adding your code, do not alter this file. * Do not delete or change test cases. Once you are done with a question, you can run the test cases to see if you programmed the question correctly. * If you get a question wrong, do not give up. Keep trying until you pass the test cases. * Rename the file as firstname_lastname_assignmentid.ipynb (e.g., marina_johnson_assignment1.ipynb) * Only submit .ipynb files (no .py files) # Question 1 1. Read the employee_attrition dataset and save it as df. Recall that the target variable in this
  • 2. dataset is named 'Attrition.' 1. Check if the dataset is imbalanced by counting the number of Noes and Yeses in the target variable Attrition. Hints: Imbalanced data refers to a situation where the number of observations is not the same for all the classes in a dataset. For example, the number of churned employees is 4000, while the number of unchurned employees is 40000. This means this dataset is imbalanced. You need to access the target variable Attrition and count how many Yes and No there is in this variable. If the number of Yes's is equal to the number of No's, then the dataset is balanced. Otherwise, it is not balanced. In [138… # Do not delete this cell import numpy as np score = dict() np.random.seed(333) 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 2/11 Check Module 5g: Encoding Categorical Variables to earn more about data
  • 3. imbalance problems. Particularly, check 2.5: Balancing datasets in Module 5. Do not alter the below cell. It is a test case for Question 1 {'question 1': 'pass'} # Question 2 1. Identify the names of the numerical input variables and save it as a LIST 1. Identify the names of the categorical input variables+ and save it as a LIST Hints: Remember Attrition is the target (output) variable, so exclude Attrition from both LISTS containing the numerical and categorical input variables. Check Modules 5b: Dropping Variables and Module 3e: Helpful Functions (check after minute 4) Do not alter the below cell. It is a test case for Question 2 In [139… import pandas as pd df = # your code to read the dataset goes in here number_of_yes = # your code to find the number # of yeses in the Attrition variable goes in here
  • 4. number_of_no = # your code to find the number # of noes in the Attrition variable goes in here In [140… try: if (number_of_yes == 237 and number_of_no == 1233): score['question 1'] = 'pass' else: score['question 1'] = 'fail' except: score['question 1'] = 'fail' score Out[140]: In [141… numerical_variables = # Your code to identify numerical variables goes in here categorical_varables = # Your code to identify categorical variables goes in here In [142… try: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 3/11 {'question 1': 'pass', 'question 2': 'pass'} # Question 3 1. Identify the numerical variables with zero variance (i.e., zero
  • 5. standard deviation) and save them in a LIST 1. Drop these numerical variables with zero variance (i.e., zero standard deviation) from the dataset df. The dataset df should not have these variables going forward. Hints: For each numerical variable, compute the standard deviation. If the standard deviation is zero, delete (i.e., drop) that variable from the dataset df. Check Modules 5b: Dropping Variables Do not alter the below cell. It is a test case for Question 3 if ((sorted(numerical_variables) == ['Age','DailyRate','DistanceFromHome','Education', 'EmployeeCount','EmployeeNumber','EnvironmentSatisfaction', 'HourlyRate','JobInvolvement','JobLevel','JobSatisfaction', 'MonthlyIncome','MonthlyRate','NumCompaniesWorked','Percen tSalaryHike', 'PerformanceRating','RelationshipSatisfaction','StandardHours', 'StockOptionLevel','TotalWorkingYears','TrainingTimesLastYea r', 'WorkLifeBalance','YearsAtCompany','YearsInCurrentRole',
  • 6. 'YearsSinceLastPromotion','YearsWithCurrManager']) and (sorted(categorical_varables) == ['BusinessTravel','Department','EducationField','Gender', 'JobRole','MaritalStatus','Over18','OverTime'])): score['question 2'] = 'pass' else: score['question 2'] = 'fail' except: score['question 2'] = 'fail' score Out[142]: In [143… zero_variance_numerical_variables = # your code to find the # numerical variables with zero variance goes in here df = # your code to drop the zero variance numerical variables goes in here In [144… try: if (zero_variance_numerical_variables == ['EmployeeCount', 'StandardHours']): score['question 3'] = 'pass' else: score['question 3'] = 'fail' except: score['question 3'] = 'fail' score 12/4/22, 7:44 PM Assignment_3
  • 7. localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 4/11 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass'} # Question 4 1. Identify the categorical variables with zero variance (i.e., low cardinality) and save them in a LIST 1. Drop these categorical variables with zero variance (i.e., low cardinality) from the dataset df. The dataset df should not have these variables going forward. Hints: For each categorical variable, find the number of levels. If the number of levels is 1, delete (i.e., drop) that variable from the dataset df. For example, if a variable named occupation has only "Engineers" across all the rows (i.e., one level), the variable does not contain any information. In other words, zero variation. Check Modules 5b: Dropping Variables Do not alter the below cell. It is a test case for Question 4 {'question 1': 'pass', 'question 2': 'pass',
  • 8. 'question 3': 'pass', 'question 4': 'pass'} # Question 5 Out[144]: In [145… zero_variance_categorical_variables = [] # your code to find the # categorical variables with zero variance goes in here df = # your code to drop the zero variance # categorical variables goes in here In [146… try: if (zero_variance_categorical_variables == ['Over18']): score['question 4'] = 'pass' else: score['question 4'] = 'fail' except: score['question 4'] = 'fail' score Out[146]: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 5/11 1. Find the categorical variables with very high variance (i.e.,
  • 9. very high cardinality) and save them in a LIST. Use 200 as the threshold. In other words, the categorical variables over 200 levels should be considered as variables with high cardinality (i.e., with high variance). 1. Drop the categorical variables with very high variance (i.e., very high cardinality) from the dataset df. The dataset df should not have these variables going forward. Hints: For each categorical variable, find the number of levels. If the number of levels is greater than 200, delete (i.e., drop) that variable from the dataset df. For example, Check Modules 5b: Dropping Variables Do not alter the below cell. It is a test case for Question 5 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass'} # Question 6 1. Scale (i.e., standardize) the numerical variables in the dataset using the standardization method and drop the original numerical variables and only keep the standardized ones.
  • 10. 2. The new standardized numerical variables should have the same variable names. For example, the age variable after being standardized should be named the same (i.e., age) Hints: Feature standardization makes the values of each feature in the data have zero- mean (when subtracting the mean in the numerator) and unit- variance. This In [147… high_cardinality_categorical_variables = [] # your code to find the # categorical variables with high variance (i.e., cardinality) goes in here df = # your code to drop the high cardinality # categorical variables goes in here In [148… try: if (high_cardinality_categorical_variables == []): score['question 5'] = 'pass' else: score['question 5'] = 'fail' except: score['question 5'] = 'fail' score Out[148]: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 6/11
  • 11. method is widely used for normalization in many machine learning algorithms. Check M5d: Standardization Do not alter the below cell. It is a test case for Question 6 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass'} # Question 7 1. Encode the categorical input variables. Do not encode the target variable Attrition. You will do that in the following question. Hints: You will create dummies for categorical variables. Example: Let's say you have a variable named occupation. This variable has three levels: Engineer, Teacher, Manager. We will use binary encoding and create dummies for each of these levels to be able to encode the occupation variable. Technically, we are converting the categorical variable into new numerical variables. We will have two new variables for this occupation variable, such as
  • 12. occupation_teacher, occupation_manager. We do not need occupation_teacher because we can infer if the person is a teacher by checking occupation_manager and occupation_engineer variables. For example: If occupation_enginner and occupation_manager are zero, then this person is a teacher. If occupation_engineer is 1, this person is an engineer. Check Module 5g: Encoding Categorical Variables In [149… # your code to standardize numerical variables goes in here df = In [150… try: if ((df['Age'].max() == 2.526885578888087) and (df['DailyRate'].max() == 1.7267301192801021)): score['question 6'] = 'pass' else: score['question 6'] = 'fail' except: score['question 6'] = 'fail' score Out[150]: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 7/11 Do not alter the below cell. It is a test case for
  • 13. Question 7 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass', 'question 7': 'pass'} # Question 8 1. Encode the categorical output variable: Attrition. Yes should be coded as 1, and No should be coded as 0. The new encoded target variable should be named as Attrition. Do not forget to drop the categorical Attirion Variable. Basically, you will convert the categorical Attrition variable into numerical attrition variable such that Yes will be mapped to 1, and No will be mapped to zero. Hints: Check Module 3 and Module 5 videos. Do not alter the below cell. It is a test case for Question 8 In [151… # your code to encode categorical input variables goes in here df = In [152… try:
  • 14. if ((df['JobRole_Laboratory Technician'].mean() == 0.1761904761904762) and (df['EducationField_Marketing'].mean() == 0.10816326530612246)): score['question 7'] = 'pass' else: score['question 7'] = 'fail' except: score['question 7'] = 'fail' score Out[152]: In [153… # your code to encode categorical output variables Attrition goes in here df = In [154… try: if (df['Attrition'].mean() == 0.16122448979591836): score['question 8'] = 'pass' else: score['question 8'] = 'fail' 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 8/11 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass',
  • 15. 'question 6': 'pass', 'question 7': 'pass', 'question 8': 'pass'} # Question 9 1. Balance the dataset 1. Your code should return the input and output variables seperately. The input variables will be saved as a dataframe named X. The output variable will be saved as a dataframe named y. Hints: Imbalanced data refers to a situation where the number of observations is not the same for all the classes in a dataset. For example, the number of churned employees is 4000, while the number of unchurned employees is 40000. This means this dataset is imbalanced. You need to access the target variable Attrition and increase the number of ones (i.e., Yeses) so that both the number of zeros (i.e., Noes) and the number of ones (i.e., Yeses) will be equal. Check M5g: Encoding Categorical Variables. balancing dataset is discussed in this video. Do not alter the below cell. It is a test case for Question 9
  • 16. except: score['question 8'] = 'fail' score Out[154]: In [156… # Your code to balance the dataset goes in here X = # dataframe containing the input variables after balancing y = # dataframe containing the output variable Attrition after balancing In [157… try: if ((y.Attrition.value_counts()[0] == 1233) and (y.Attrition.value_counts()[1] == 1233)): score['question 9'] = 'pass' else: score['question 9'] = 'fail' except: score['question 9'] = 'fail' score 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 9/11 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass', 'question 7': 'pass', 'question 8': 'pass',
  • 17. 'question 9': 'pass'} # Question 10 Split the dataset into training and testing Basically using X and y dataframes, you will create X_train, X_test, y_train, and y_test. You need to keep 70% of the dataset for training and 30% for testing. Hints: You can use the train_test_split function in sklearn library Check Module M6c: Classification Do not alter the below cell. It is a test case for Question 6 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass', 'question 7': 'pass', 'question 8': 'pass', 'question 9': 'pass', 'question 10': 'pass'} # Out[157]:
  • 18. In [158… # your code to create train and test sets goes in here X_train, X_test, y_train, y_test = # your code to create train and test sets goes in here In [159… try: if ((X_train.shape[0]<1750) and (X_train.shape[0]>1700)): score['question 10'] = 'pass' else: score['question 10'] = 'fail' except: score['question 10'] = 'fail' score Out[159]: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 10/11 Question 11 1. Train a knn model where k is 3 using the training dataset. 1. Make predictions using the test dataset 1. Compute accuracy and save as accuracy Hints: You need to use the KNeighborsClassifier function. Instantiate a knn object and pass the number of neighbors to the function. Train the model using the X_train
  • 19. and y_train. Then make predictions using X_test. Then compute the accuracy using the predicted values and y_test. Check Module 6d: Model Performance and _Module 5c: Classification Do not alter the below cell. It is a test case for Question 11 {'question 1': 'pass', 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass', 'question 7': 'pass', 'question 8': 'pass', 'question 9': 'pass', 'question 10': 'pass', 'question 11': 'pass'} # Question 12 1. Train a Random Forests model where the number of estimators is 100 using the training dataset. In [160… # Your code to train knn, make predictions, and compute accuracy goes in here accuracy = # compute accuracy here In [161… try: if (accuracy > 0.70):
  • 20. score['question 11'] = 'pass' else: score['question 11'] = 'fail' except: score['question 11'] = 'fail' score Out[161]: 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipy nb?download=false 11/11 1. Make predictions using the test dataset 1. Compute accuracy and save as accuracy Hints: You need to use the RandomForestClassifier function. Instantiate a RandomForestClassifier object and pass the number of estimators to the function. Train the model using the X_train and y_train. Then make predictions using X_test. Then compute the accuracy using the predicted values and y_test. Check Module 6d: Model Performance and _Module 5c: Classification Do not alter the below cell. It is a test case for Question 6 {'question 1': 'pass',
  • 21. 'question 2': 'pass', 'question 3': 'pass', 'question 4': 'pass', 'question 5': 'pass', 'question 6': 'pass', 'question 7': 'pass', 'question 8': 'pass', 'question 9': 'pass', 'question 10': 'pass', 'question 11': 'pass', 'question 12': 'pass'} # Your Grade Your overall score is: 100 In [162… # Your code to train random forest, make predictions, and compute accuracy goes in here accuracy = # compute accuracy here In [163… try: if (accuracy > 0.80): score['question 12'] = 'pass' else: score['question 12'] = 'fail' except: score['question 12'] = 'fail' score Out[163]: In [164… print('Your overall score is: ', round(list(score.values()).count('pass')*8.3333))