This example shows how to search for hyperparameters of one model (i.e. not the whole AutoML run with several models). The example dataset is kc1 from OpenML. Note: A cluster with 10 workers was started before running this notebook.
from techila_ml import find_best_hyperparams
from techila_ml.datasets import openml_dataset_loader
from techila_ml.stats_vis_funs import plot_res_figs
data = openml_dataset_loader({'openml_dataset_id': 1067, 'target_variable': 'defects'})
scorefun = "roc_auc"
n_jobs = 10 # number of Techila jobs
n_iterations = 50
The data can be given as Pandas dataframes or Numpy arrays. Here the OpenML data loader returns Pandas dataframes and series:
data['X_train'].head()
data['y_train'].head()
# search for best hyperparameters for random forest
res = find_best_hyperparams(
n_jobs,
n_iterations,
data,
task='classification',
model='randomforest',
optimization={
'score_f': scorefun,
},
logging_params = { 'progress_bar': False }
)
The returned object has several types of info and statistics but perhaps the most useful ones are the best score and the model that produced it:
res['best_cv_score']
Note that this score is the best cross-validation score since we did not give test data (which could have been split from the OpenML dataset). If test data is given then the result will have an entry 'test_scores'.
res['best_model']
We can also use a supplied function to plot a couple of figures about the optimization job, namely the evolution of scores as the iterations progress, as well as distributions of prediction speeds and model sizes:
plot_res_figs(res, "RF HPO")