1. Introduction
Note! This is a beta release. Please report any issues or bugs you find to cloudsupport@techilatechnologies.com.
This document is intended for Techila Distributed Computing Engine (TDCE) End-Users who are working in the Machine Learning (ML) field and are using Python as their development language. The purpose of this document is to provide an overview of Techila AutoML using code samples and example material that highlights how different aspects of how Techila AutoML can be used.
If you are unfamiliar with the terminology or the operating principles of TDCE, information on these can be found in Introduction to Techila Distributed Computing Engine.
2. Requirements
This Chapter contains a list of requirements that must be met in order to use Techila AutoML.
2.1. Operating System
Techila AutoML requires a Debian 10 operating system. If you are using a Mac OS X, Microsoft Windows or a different Linux distribution, you can run Techila AutoML in a Docker environment. Please see Running in Docker for more information.
Additionally, you will need to use Techila Workers that have a Linux operating system.
2.2. Python Packages and Versions
Techila AutoML supports Python 3.7.
2.2.1. Installing the techila
Python Package
Techila AutoML uses functionality from the techila
Python package.
Note! If you plan on running your computations in Docker, you do not need to install the techila
package to your own computer. Instead, the build scripts included in the Techila SDK will install the package automatically to a Docker image. Please see Running in Docker for more details.
Instructions for manually installing the techila
package can be found in the Techila Python guide.
2.2.2. Installing techila_ml Requirements
The package requirements of the techila_ml package are listed in the techila/lib/techila_ml/requirements.txt
file.
Note! If you plan on running your computations in Docker, you do not need to install any of the required packages to your own computer. Instead, the build scripts included in the Techila SDK will install the package requirements automatically to a Docker image. Please see Running in Docker for more details.
Please note that installing the requirements may take up to 1 hour.
You can install the package requirements by running the following commands:
cd path/to/techila/lib/techila_ml
pip3 install -r requirements.txt
2.2.3. Installing the techila_ml Python Package
The techila_ml package is included in the Techila SDK, in folder techila\lib\techila_ml
Note! If you plan on running your computations in techila_ml, you do not need to install the techila_ml package to your own computer. Instead, the build scripts included in the Techila SDK will install the package automatically to a Docker image. Please see Running in Docker for more details.
You can install the package requirements by running the following commands:
cd path/to/techila/lib/techila_ml
python3 setup.py install --user
3. Example Notebooks
Python notebooks containing example material for Techila AutoML can be found in following folder in the Techila SDK.
-
techila/examples/techila_ml/notebooks
Please see the links below for html versions of these notebooks:
4. Example Python Scripts
This chapter contains code samples that illustrate how you can use Techila AutoML and the available features with different types of machine learning datasets. These examples can be found in the following folder in the Techila SDK.
-
techila/examples/techila_ml/scripts
4.1. Supported Data Types
This example shows the syntax for defining the input data when using Techila AutoML.
import numpy as np
import pandas as pd
from techila_ml import find_best_model
from techila_ml.configs import OptionalPackages
OptionalPackages.use = False
# Number of Techila jobs
n_jobs = 2
# Number of iterations
n_iterations = 8
def load_data():
# Function for generating dummy training data.
# Arbitrary numpy data example
X_train = np.random.random((500, 20))
y_train = np.random.randint(0,2,500)
X_validation = np.random.random((50, 20))
y_validation = np.random.randint(0,2,50)
# Alternatively, pandas data format could also be used. Uncomment to use.
# X_train = pd.DataFrame(X_train)
# y_train = pd.Series(y_train)
# X_validation = pd.DataFrame(X_validation)
# y_validation = pd.Series(y_validation)
return {'X_train': X_train, 'y_train': y_train, 'X_validation': X_validation, 'y_validation': y_validation}
# Load the data locally.
data = load_data()
# Search for the best model in TDCE.
res = find_best_model(
n_jobs,
n_iterations,
data,
task='classification',
)
print(f"best score: {res['best_cv_score']}")
4.2. MNIST
This example shows how to use Techila AutoML to find the best model for the MNIST data set.
from techila_ml import find_best_model
import pandas as pd
from techila_ml.configs import OptionalPackages
OptionalPackages.use = False
# Number of Techila Jobs
n_jobs = 8
# Number of iterations
n_iterations = 16
def load_data():
from keras.datasets import mnist
(X_train, y_train), (X_validation, y_validation) = mnist.load_data()
# Convert to pandas format.
X_train = pd.DataFrame(X_train.reshape(-1,X_train[0].size))
y_train = pd.Series(y_train)
X_validation = pd.DataFrame(X_validation.reshape(-1,X_validation[0].size))
y_validation = pd.Series(y_validation)
return {'X_train': X_train, 'y_train': y_train, 'X_validation': X_validation, 'y_validation': y_validation}
# Load the data locally.
data = load_data()
# Search for the best model in TDCE.
res = find_best_model(
n_jobs,
n_iterations,
data,
task='classification',
optimization={
'optimizer': 'skopt',
}
)
4.3. IRIS Dataset - Autostopping
This example shows how to use the autostopping feature in Techila AutoML to automatically stop the optimization process after the model’s performance has not improved during the latest iterations.
from techila_ml import find_best_model
from techila_ml.configs import OptionalPackages
OptionalPackages.use = False
# Number of Techila jobs
n_jobs = 20
# Number of iterations
n_iterations = 1600
def load_data():
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_validation, y_train, y_validation = train_test_split(X, y, random_state=0)
return {'X_train': X_train, 'y_train': y_train, 'X_validation': X_validation, 'y_validation': y_validation}
# Load the data locally.
data = load_data()
# Search for the best model in TDCE.
res = find_best_model(
n_jobs,
n_iterations,
data,
task='classification',
optimization={
'optimizer': 'skopt',
'study_auto_stopping': True,
'auto_stopping': False,
}
)
4.4. Diabetes Dataset - Regression
This example shows how you can apply Techila AutoML to solve a regression problem.
from techila_ml import find_best_model
from techila_ml.configs import OptionalPackages
OptionalPackages.use = False
# Number of Techila jobs
n_jobs = 20
# Number of iterations
n_iterations = 160
def load_data():
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
X_train, X_validation, y_train, y_validation = train_test_split(X, y, random_state=0)
return {'X_train': X_train, 'y_train': y_train, 'X_validation': X_validation, 'y_validation': y_validation}
# Load the data locally.
data = load_data()
# Search for the best model in TDCE.
res = find_best_model(
n_jobs,
n_iterations,
data,
task='regression',
optimization={
'optimizer': 'skopt',
}
)
print(f"best score: {res['best_cv_score']}")
4.5. IRIS Dataset - Random Search
This example shows how you can use a random
search instead of skopt
when using Techila AutoML.
from techila_ml import find_best_model
from techila_ml.configs import OptionalPackages
OptionalPackages.use = False
# Number of Techila jobs
n_jobs = 20
# Number of iterations
n_iterations = 100
def load_data():
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_validation, y_train, y_validation = train_test_split(X, y, random_state=0)
return {'X_train': X_train, 'y_train': y_train, 'X_validation': X_validation, 'y_validation': y_validation}
# Load the data locally.
data = load_data()
# Search for the best model in TDCE.
res = find_best_model(
n_jobs,
n_iterations,
data,
task='classification',
optimization={
'optimizer': 'random',
}
)
5. Running in Docker
This Chapter contains examples on how you can run Techila AutoML in Docker.
Before continuing, please make sure that you have Docker installed on your computer.
The flow listed below describes how you can use Docker to run Techila AutoML. Using the Docker approach will minimize the differences between the local and TDCE environment. This can be useful in situations where differences (e.g. differences in package versions) between your local Python development environment and the TDCE execution environment are causing problems.
-
Download the TechilaSDK.zip to your own computer from the Techila Configuration Wizard.
-
Extract TechilaSDK.zip to your own computer. Make a mental note where you extracted it. The example flow below assumes that the TechilaSDK.zip was extracted to
/home/user/techila
. This directory should contain files calledtechila_settings.ini
andadmin.jks
. -
Copy the TechilaSDK.zip from your own computer, from where you downloaded it, to the current working directory. After copying the file, it should be located in the same folder with the
Dockerfile
file. The TechilaSDK.zip file will be included in the image in the next step (excluding credentials). -
Modify the
yourimagenamehere
parameter below to have a descriptive name for your image.sudo docker build -f Dockerfile -t yourimagenamehere .
After modifying the command, run it. This will create a Docker image that can be used to run Techila SDK.
-
Next you will need to create a Bundle from the container image you just created. This can be done using the command shown below. Before running the command, please update the following values:
-
/tmp/dockertmp
- Modify this to point to a directory on your computer that can be used to store the Docker image. -
/home/user/techila
- Modify this to point to the directory where you extracted the TechilaSDK.zip file on your computer. This is the directory that contains thetechila_settings.ini
andadmin.jks
files. -
yourimagenamehere
- Modify this to match your image name (the one you defined earlier, when executing thedocker build
command)Modify the command shown below with the values you are using and run the command to create the Bundle.
sudo docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/dockertmp:/tmp -v /home/user/techila:/techila yourimagenamehere /usr/bin/python3 py/createcontainerbundle.py
Creating the Bundle may take several minutes (up to 30, depending on your network speed).
-
-
After the Bundle has been created, you can run an example that is included in the Techila SDK to verify that everything works:
sudo docker run -it -v /home/user/techila:/techila -e TECHILA_ML_DOCKER=true yourimagenamehere /usr/bin/python3 /techila/examples/techila_ml/scripts/run_datatypes.py
In addition to the TECHILA_ML_DOCKER environment variable, docker usage can also be specified with
docker
parameter forfind_best_model
(docker=False|True|<bundlename>).