Introduction to Auto-PyTorch

Automated machine learning is a big topic of this blog. Therefore, we will look at another autoML framework: Auto-PyTorch.

Contents

Introducing Auto-PyTorch
Installation
First look: Kuzushiji-MNIST
Second first look: MNIST

Introducing Auto-PyTorch

This software is an early Alpha version. However, I expect at least some functionality and to be honest, Auto-Keras more usable in early alpha than now.

Auto-PyTorch originates from work by Mendoza et al . (2018) and uses BOHB to do neural architecture search with PyTorch.

Installation

The installation is not straight forward. However, let’s start with something simple and create a new conda environment for Auto-PyTorch:

(base) $ conda create -n autopytorch

NB!: installing PyTorch via conda doesn’t work without running into errors. Unfortunately, I was not able track the error down (with a reasonable amount of work), but is was so fatal that the error message was a core dump. Using pip seems solves this problem.

(autopytorch) $ pip install torchvision pytorch
(autopytorch) $ git clone https://github.com/automl/Auto-PyTorch 
(autopytorch) $ cd Auto-PyTorch
(autopytorch) $ cat requirements.txt | xargs -n 1 -L 1 pip install
(autopytorch) $ python setup.py install

That’s it. Now we can have a look at it.

First look: Kuzushiji-MNIST

Instead of starting with the same boring MNIST example, I decided to try it on Kuzushiji-MNIST:

import numpy as np
import matplotlib.pyplot as plt
from autoPyTorch import AutoNetClassification
import sklearn.metrics


def load(f):
    return np.load(f)['arr_0']



X_train = load("./data/kmnist-train-imgs.npz")
X_test = load("./data/kmnist-test-imgs.npz")
y_train = load("./data/kmnist-train-labels.npz")
y_test = load("./data/kmnist-test-labels.npz")



# running Auto-PyTorch
autoPyTorch = AutoNetClassification("tiny_cs",  # config preset
                                    log_level='info',
                                    max_runtime=300,
                                    min_budget=30,
                                    max_budget=90)

autoPyTorch.fit(X_train, y_train, validation_split=0.3)
y_pred = autoPyTorch.predict(X_test)

print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_pred))

Well? And? Well, I ended up with a constant error log that looks like this:

WORKER: start processing job (0, 0, 1)
Fit optimization pipeline
[AutoNet] No validation set given and either no cross validator given or budget too low for CV. Continue by splitting 0.3 of training data.
[AutoNet] CV split 0 of 1

Further, I want to add that Auto-PyTorch uses Pyro4 and therefore is susceptible to network connectivity (Pyro4.errors.CommunicationError: cannot connect to ('192.168.xxx.xxx', xxxxx): [Errno 101] Network is unreachable). Hence, you may run into trouble running it on a notebook while traveling and enjoying areas without wifi or changing connectivity (please thank NetworkManager and probably systemd for this ;)).

Well, I re-tried it a bit and didn’t received any results but this error message:

RuntimeError: No models fit during training, please retry with a larger max_runtime.

Even one hour was not enough produce something useful. I tried up to two hours before deciding to stop my efforts here. From my understanding, it will use some form of resnet as a basis model. Therefore, I would have expected at least a simple result after a single iteration (similar to fastai)

Second First Look: MNIST

Let’s see if at least the example provided by the readme file on github works.

from autoPyTorch import AutoNetClassification

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

# running Auto-PyTorch
autoPyTorch = AutoNetClassification("tiny_cs",  # config preset
                                    log_level='info',
                                    max_runtime=300,
                                    min_budget=30,
                                    max_budget=90)

autoPyTorch.fit(X_train, y_train, validation_split=0.3)
y_pred = autoPyTorch.predict(X_test)

print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_pred))

Done Refitting
Accuracy score 0.98

Sounds good, right? Well, considering that it run for 5 minutes (300 sec) it probably performed on the level of a fastai one liner.