Grid search.pptx

GRID SEARCH
PRESENTED BY
ATHIRA U R
BROLIN NIBI S G
JENISHA T
JIJOSHINI T S
JINI L

INTRODUCTION
 Grid search is a popular hyperparameter tuning technique used in
machine learning to find the optimal hyperparameters for a model.
 Hyperparameters are parameters that are set by the user before
training the model and cannot be learned by the model during
training.

HOW TO USE GRID SEARCH
 The grid search algorithm involves specifying a range of values for
each hyperparameter and then evaluating the performance of the
model for all possible combinations of these values.
 The performance of the model is usually measured by a scoring
metric such as accuracy or F1 score.
 The combination of hyperparameters that results in the best
performance is selected as the optimal set of hyperparameters for
the model.

ADVANTAGES OF GRID SEARCH
Exhaustive search: Grid search exhaustively searches through a
hyperparameter space, ensuring that the optimal set of
hyperparameters is found.
Simple implementation: Grid search is simple to implement and
does not require any advanced optimization techniques.
Reproducible results: Grid search produces reproducible results,
as the same hyperparameters will always produce the same model.

DISADVANTAGES OF GRID SEARCH
Computationally expensive: Grid search can be computationally
expensive, particularly when there are many hyperparameters or
large datasets.
Limited search space: Grid search is limited to the
hyperparameters and their respective ranges specified by the user,
which may not include the optimal set of hyperparameters.

GRID SEARCH USING PYTHON
Finding the values of the important parameters of a model (the
ones that provide the best generalization performance) is a tricky
task, but necessary for almost all models and datasets.
Because it is such a common task, there are standard methods in
scikit-learn to help you with it.
The most commonly used method is grid search, which basically
means trying all possible combinations of the parameters of
interest.

Consider the case of a kernel SVM with an RBF (radial basis
function) kernel, as implemented in the SVC class.
There are two important parameters: the kernel bandwidth,
gamma, and the regularization parameter, C.
Say we want to try the values 0.001, 0.01, 0.1, 1, 10, and 100 for
the parameter C, and the same for gamma.

SIMPLE GRID SEARCH
Involves creating a grid of hyperparameters and evaluating the
model for each combination
The algorithm selects the hyperparameters that result in the best
model performance
Can be computationally expensive for large hyperparameter spaces

o from sklearn import svm, datasets
o from sklearn.model_selection import GridSearchCV
o # Load iris dataset
iris = datasets.load_iris()
o # Define hyperparameter grid
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'poly', 'rbf', 'sigmoid']}
o # Define SVM classifier
svc = svm.SVC()
o # Define grid search object
grid = GridSearchCV(svc, param_grid)
o # Fit the grid search object to the data
grid.fit(iris.data, iris.target)
o # Print the best hyperparameters and corresponding score
print(grid.best_params_, grid.best_score_)

DANGER OF OVERFITTING THE PARAMETERS AND
THE VALIDATION SET
Overfitting can occur when hyperparameters are tuned on the
validation set
It's important to split the dataset into training, validation, and
testing sets to avoid overfitting
The training set is used to train the model, the validation set is used
to tune the hyperparameters, and the testing set is used to evaluate
the final model performance
Example:

o mglearn.plots.plot_threefold_split()
• A threefold split of data into training set, validation set, and test set

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
# Load iris dataset
iris = datasets.load_iris()
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2,
random_state=42)
# Define hyperparameter grid
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'poly', 'rbf', 'sigmoid']}
# Define SVM classifier
svc = SVC()
# Define grid search object with 5-fold cross-validation
grid = GridSearchCV(svc, param_grid, cv=5)

o # Fit the grid search object to the training data
grid.fit(X_train, y_train)
o # Print the best hyperparameters and corresponding score
print(grid.best_params_, grid.best_score_)
o # Evaluate final model performance on testing set
print(grid.score(X_test, y_test))

GRID SEARCH WITH CROSS VALIDATION
Involves creating a grid of hyperparameters and performing k-fold
cross-validation for each combination
The algorithm selects the hyperparameters that result in the best
average cross-validation score
Helps to avoid overfitting by evaluating the model on multiple
validation sets

o mglearn.plots.plot_cross_val_selection()
• Results of grid search with cross-validation

o mglearn.plots.plot_grid_search_overview()
• Overview of the process of parameter selection and model evaluation with
GridSearchCV

EXAMPLE CODE:
for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:
for C in [0.001, 0.01, 0.1, 1, 10, 100]:
o # for each combination of parameters,
o # train an SVC
svm = SVC(gamma=gamma, C=C)
o # perform cross-validation
scores = cross_val_score(svm, X_trainval, y_trainval, cv=5)
o # compute mean cross-validation accuracy
score = np.mean(scores)
o # if we got a better score, store the score and parameters
if score > best_score:
best_score = score
best_parameters = {'C': C, 'gamma': gamma}

o # rebuild a model on the combined training and validation set
svm = SVC(**best_parameters)
svm.fit(X_trainval, y_trainval)

ANALYZING THE RESULT OF CROSS VALIDATION
It's important to analyze the results of cross-validation to determine
the optimal hyperparameters
Results can be visualized using a heat map or line chart
The optimal hyperparameters are those that result in the highest
cross-validation score

EXAMPLE :
scores = np.array(results.mean_test_score).reshape(6, 6)
#plot the mean cross-validation scores
mglearn.tools.heatmap(scores, xlabel='gamma',
xticklabels=param_grid['gamma'],
ylabel='C', yticklabels=param_grid['C'], cmap="viridis")
• Heat map of mean cross-validation score as a function of C and gamma

SEARCH OVER SPACES THAT ARE NOT GRIDS
Randomized search involves sampling hyperparameters from a
specified distribution
The hyperparameters are used to train and evaluate the model, and
the process is repeated until the optimal hyperparameters are found
Can be useful for hyperparameters that do not form a grid

USING DIFFERENT CROSS-VALIDATION STRATEGIES WITH
GRID SEARCH
Similarly to cross_val_score, GridSearchCV uses stratified k-fold
cross-validation by default for classification, and k-fold cross-
validation for regression.
However, you can also pass any cross-validation splitter, as
described in “More control over crossvalidation” as the cv parameter
in GridSearchCV.
In particular, to get only a single split into a training and a
validation set, you can use ShuffleSplit or StratifiedShuffleSplit with
n_iter=1.

NESTED CROSS VALIDATION
Nested cross-validation is a technique used to evaluate the
performance of a machine learning model while tuning its
hyperparameters using grid search.
It involves an outer loop of k-fold cross-validation to estimate the
model's generalization performance, and an inner loop of k-fold
cross-validation to tune the hyperparameters using grid search
The dataset is split into training and test sets, and the test set is
kept aside and not used until the end of the entire nested cross-
validation process.

The best hyperparameters found during the inner loop are used to
train the model on the entire training set.
The trained model is then tested on the test set, which was kept aside
in the beginning.
The entire process is repeated for different random splits of the
dataset into training and test sets.
Scikit-learn provides a convenient way to implement nested cross-
validation with grid search using its GridSearchCV() and
cross_val_score() functions.
1.Scikit-learn provides a convenient way to implement nested cross-validation with grid search using its GridSearchCV() and cross_val_score() functions.
1.Scikit-learn provides a convenient way to implement nested cross-validation with grid search using its GridSearchCV() and cross_val_score() functions.

Implementing nested cross-validation in scikit-learn is
straightforward. We call cross_val_score with an instance of
GridSearchCV as the model:
 EXAMPLE:
scores = cross_val_score(GridSearchCV(SVC(), param_grid, cv=5),
iris.data, iris.target, cv=5)
print("Cross-validation scores: ", scores)
print("Mean cross-validation score: ", scores.mean())
 OUTPUT:
Cross-validation scores: [ 0.967 1. 0.967 0.967 1. ]
Mean cross-validation score: 0.98
The result of our nested cross-validation can be summarized as “SVC

Grid search.pptx

More Related Content

What's hot

Similar to Grid search.pptx

Recently uploaded

Grid search.pptx