Using custom model in HPO

This example shows how to define a custom model and run hyperparameter optimization for it. First we'll just load some example data.

In [1]:
from techila_ml import find_best_hyperparams
from techila_ml.datasets import openml_dataset_loader
from techila_ml.stats_vis_funs import plot_res_figs


data = openml_dataset_loader({'openml_dataset_id': 31, 'target_variable': 'class'})

scorefun = "roc_auc"
n_jobs = 1
n_iterations = 5
Techila Python module using JPyPe

The models need to follow scikit-learn API. Here we give a minimal example. The predictor is not doing anything sensible, just predicting class 1 all the time.

In [2]:
import numpy as np


class CustomPredictor:
    def __init__(self, custom_parameter=1):
        self.a = custom_parameter
    def fit(self, X, y, **kwargs):
        self.classes_ = np.unique(y)
        return self
    def predict(self, X, **kwargs):
        return np.ones(X.shape[0])
    def predict_proba(self, X, **kwargs):
        return np.c_[np.zeros((X.shape[0], 1)), np.ones((X.shape[0], 1))]

Then, we need to define a custom configuration that specifies for the optimizer what needs to be done. So we give the predictor a name, define whether it's a classification or regression task and specify the configuration space dimensions that will be searched thru while optimizing. Some scalings and transformations can also be defined (for this minimal example we only specify standard scaler).

In [3]:
from techila_ml.models.base_config import PredictorConfig
from techila_ml.dimensions import IntegerDimension
from sklearn.preprocessing import StandardScaler

class CustomPredictorConfig(PredictorConfig):
    name = 'custompredictor'
    modelclass = CustomPredictor
    task = 'classification'
    dims = [
        IntegerDimension(name='custom_parameter', low=1, high=10),
    ]
    numeric_columns_transform = [(StandardScaler, [])]

Then we can run the optimization for this model (note that we're using mode='local_test' to run on local computer without sending the job to the cluster). The results do not make much sense of course but this can be extended with more sensible models.

In [4]:
res = find_best_hyperparams(
    n_jobs,
    n_iterations,
    data,
    task='classification',
    model=CustomPredictorConfig,
    optimization={
        'score_f': scorefun,
    },
    mode='local_test'
)
settingloglevel
 *** running for 5 iterations...
 ***  1/5 ( 20%) configs |  0 m  1 s elapsed | score 0.500 (best 0.500 @iter 1)
 ***  2/5 ( 40%) configs |  0 m  1 s elapsed | score 0.500 (best 0.500 @iter 1)
 ***  3/5 ( 60%) configs |  0 m  2 s elapsed | score 0.500 (best 0.500 @iter 1)
 ***  4/5 ( 80%) configs |  0 m  2 s elapsed | score 0.500 (best 0.500 @iter 1)
 ***  5/5 (100%) configs |  0 m  3 s elapsed | score 0.500 (best 0.500 @iter 1)