25. 25
Preparing Data → Processing Data → Training → Model → Serve Predictions
4
26. 26
Preparing Data → Processing Data → Training → Model → Serve Predictions
Data for demo
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
4
27. Preparing Data → Processing Data → Training → Model → Serve Predictions
27
- 1460 rows, 81 columns
- Data types include:
int64
float64
object
4
28. Preparing Data → Processing Data → Training → Model → Serve Predictions
28
- Fill NaN
- Outlier
- Data Transformation
4
29. Preparing Data → Processing Data → Training → Model → Serve Predictions
29
4
30. 30
train = train.replace([-np.inf, np.inf], 0.0)
train = train.fillna(0.0)
test = test.replace([-np.inf, np.inf], 0.0)
test = test.fillna(0.0)
Preparing Data → Processing Data → Training → Model → Serve Predictions
Sample code for filling NaN by Zero
4
31. Preparing Data → Processing Data → Training → Model → Serve Predictions
31
4
33. Preparing Data → Processing Data → Training → Model → Serve Predictions
33
4
34. 34
def label_encoding(df):
cat_features = df.select_dtypes(include=['object']).columns
lbl = LabelEncoder()
for col in cat_features:
df[col] = lbl.fit_transform(list(df[col].values.astype('str')))
return df
train = label_encoding(train)
Preparing Data → Processing Data → Training → Model → Serve Predictions
Sample code for label encoding
4
35. Preparing Data → Processing Data → Training → Model → Serve Predictions
35
Linear regression
4
36. 36
ignore_cols = ['Id', 'SalePrice']
train_features = [col for col in train.select_dtypes(include=['float64',
'int64']).columns
if col not in ignore_cols]
X = train[train_features].copy()
y = np.log1p(train['SalePrice'])
clf = LinearRegression()
clf.fit(X, y)
Using simple Linear Regression model
4
Preparing Data → Processing Data → Training → Model → Serve Predictions
37. Preparing Data → Processing Data → Training → Model → Serve Predictions
37
Evaluation Model
4
38. 38
def rmse_score(y_obs, y_hat):
rmse = np.sqrt(mean_squared_error(y_obs, y_hat))
return rmse
X_trn, X_val, y_trn, y_val = train_test_split(
X, y, test_size=0.2, random_state=42)
clf.fit(X_trn, y_trn)
y_hat = clf.predict(X_val)
rmse = rmse_score(y_val, y_hat)
print('RMSE score: {:.6f}'.format(rmse))
Preparing Data → Processing Data → Training → Model → Serve Predictions
Sample code for Evaluation Model
4
39. Prepare Data → Processing Data → Training → Model → Serve Predictions
39
What is my house price?
New Info ML Web Service Predicted Price
4
40. Let’s getting started your wonderful Machine Learning project with
Python & Jupyter Notebook
40
4