12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipynb?download=false 1/11 INSTRUCTIONS: * Add your code as indicated in each cell. * Besides adding your code, do not alter this file. * Do not delete or change test cases. Once you are done with a question, you can run the test cases to see if you programmed the question correctly. * If you get a question wrong, do not give up. Keep trying until you pass the test cases. * Rename the file as firstname_lastname_assignmentid.ipynb (e.g., marina_johnson_assignment1.ipynb) * Only submit .ipynb files (no .py files) # Question 1 1. Read the employee_attrition dataset and save it as df. Recall that the target variable in this dataset is named 'Attrition.' 1. Check if the dataset is imbalanced by counting the number of Noes and Yeses in the target variable Attrition. Hints: Imbalanced data refers to a situation where the number of observations is not the same for all the classes in a dataset. For example, the number of churned employees is 4000, while the number of unchurned employees is 40000. This means this dataset is imbalanced. You need to access the target variable Attrition and count how many Yes and No there is in this variable. If the number of Yes's is equal to the number of No's, then the dataset is balanced. Otherwise, it is not balanced. In [138… # Do not delete this cell import numpy as np score = dict() np.random.seed(333) 12/4/22, 7:44 PM Assignment_3 localhost:8888/nbconvert/html/Assignment_3/Assignment_3.ipynb?download=false 2/11 Check Module 5g: Encoding Categorical Variables to earn more about data imbalance problems. Particularly, check 2.5: Balancing datasets in Module 5. Do not alter the below cell. It is a test case for Question 1 {'question 1': 'pass'} # Question 2 1. Identify the names of the numerical input variables and save it as a LIST 1. Identify the names of the categorical input variables+ and save it as a LIST Hints: Remember Attrition is the target (output) variable, so exclude Attrition from both LISTS containing the numerical and categorical input variables. Check Modules 5b: Dropping Variables and Module 3e: Helpful Functions (check after minute 4) Do not alter the below cell. It is a test case for Question 2 In [139… import pandas as pd df = # your code to read the dataset goes in here number_of_yes = # your code to find the number # of yeses in the Attrition variable goes in here number_of_no = # your code to find the number # of noes in the Attrition variable goes in here In [140… try: if (number_of_yes == 237 and number_of_no == 1233): score['question 1'] = 'pass' else: score['question 1'] = 'fail' except: score['question 1'] = 'fail' score Out[140]: In [141… numerical_variables = # Your code to identify numerical variables goes in here categorical_varables = .