Contents
Introduction
Google published AdaNet recently. It looks a bit like a combination of auto-keras and auto-sklearn (neural architecture search + meta learning) . My experiences with frameworks for automated machine learning were mostly disappointing compared to “simple best practices”. I want to revisit auto-keras once it is released in version 0.3 - until then, let’s see if AdaNet is of more use outside the set of standard benchmark tasks. AdaNet is based on the paper AdaNet: Adaptive Structural Learning of Artificial Neural Networks by Coretes et al (2017) proposing an algorithm that learns neural network structures and weights adaptively.
Installation
There is a pip package available meaning that (env) $ pip install adanet
should be sufficient. However, I’m not so sure if it is trustworthy since the readme points out how to build it ourselves. AdaNet requires Bazel which is a build tool similar to make. If we can install using pre-compiled packages (Ubuntu, Fedora,RHEL,CentOS, macOS, Windows) or compile it ourselves (my prefered method). Further, it requires a TensorFlow version >= 1.7.0. Next, we have to download it and compile it:
(env) $ git clone https://github.com/tensorflow/adanet
(env) $ cd adanet/adanet
# test if bazel works correctly
(env) $ bazel test -c opt //..
# building the package
(env) bazel build //pip_package:build_pip_package
(env) pip install bazel-bin/adanet/pip_package/build_pip_package/*.whl
If everything worked out correctly, then we can import adanet in Python:
import adanet
Structure of AdaNet
adanet.core
core
seems to be an abstraction layer/class that contains all the classes and functions below.
adanet.absolute_import
absolute_import
is neither documented nor did any part of the source code on github describes it. It is imported from__future__
therefore it is a part of python itself.
adanet.division
division
is imported from__future__
as well and therefore it is a part of python itself.
adanet.print_function
print_function
is imported from__future__
as well and therefore it is a part of python itself.
adanet.subnetwork
subnetwork
loads builder/generator functions and theSubnetwork
class as well as aReport
class containing all hyperparameters, attributes and metrics.
adanet.Ensemble
Ensemble
is a class that represents a collection of subnetworks that form a neural network by using a weighted sum on their outputs.
adanet.Estimator
Estimator
is the class that implements the AdaNet algorithms propose in the original paper.
adanetEvaluator
Evaluator
evaluates the network by computing losses for different steps and batches.
adanet.MixtureWeightType
MixtureWeightType
povides weights of types scalar, vector and matrix.
adanet.ReportMaterializer
ReportMaterializer
stores values for internal documentation of the process.
adanet.Subnetwork
Subnetwork
stores a single subnetwork from an ensemble.
adanet.Summary
Summary
is an interface to TensorBoard.
adanet.WeightedSubnetwork
WeightedSubnetwork
contains the weights for ensembling subnetworks
First Steps - MNIST Toy Example
Google provides tutorials for Boston Housing Prices and Fashion-MNIST. Both tutorials are licensed under an Apache 2.0 license. Since I tested autokeras for MNIST, I’m going to start with MNIST as well. The code from the Fashion-MNIST tutorial is used here, however I’m commenting it differently.
It seems like we have to load a few functions from __future__
, although adanet loads a few functions from __future__
:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
AdaNet requires functtools, hence we haave to load it:
import functools
Now, we can load adanet and tensorflow:
import adanet
import tensorflow as tf
# set random seed to make results reproducible
RANDOM_SEED = 42
We can load the MNIST dataset using keras for our convenience:
(x_train, y_train), (x_test, y_test) = (tf.keras.datasets.mnist.load_data())
Now comes something completely different than I expected. From my limited understanding of adanet (so far), we have to write generator functions and the search space ourselves. From an initial description I expected something closer to Google AutoML or auto-keras.
First, we have write our generator function:
FEATURES_KEY = "images"
def generator(images, labels):
"""Returns a generator that returns image-label pairs."""
def _gen():
for image, label in zip(images, labels):
yield image, label
return _gen
Next, we have to write our own image preprocessing function (this is less convenient than using Keras or PyTorch):
def preprocess_image(image, label):
"""Preprocesses an image for an `Estimator`."""
# First let's scale the pixel values to be between 0 and 1.
image = image / 255.
# Next we reshape the image so that we can apply a 2D convolution to it.
image = tf.reshape(image, [28, 28, 1])
# Finally the features need to be supplied as a dictionary.
features = {FEATURES_KEY: image}
return features, label
Now, we have to write our own input function that generates each batch:
def input_fn(partition, training, batch_size):
"""Generate an input_fn for the Estimator."""
def _input_fn():
if partition == "train":
dataset = tf.data.Dataset.from_generator(
generator(x_train, y_train), (tf.float32, tf.int32), ((28, 28), ()))
else:
dataset = tf.data.Dataset.from_generator(
generator(x_test, y_test), (tf.float32, tf.int32), ((28, 28), ()))
# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
if training:
dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()
dataset = dataset.map(preprocess_image).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
return _input_fn
Next, we will have to define some basic properties of our NN training process:
# The number of classes.
NUM_CLASSES = 10
# We will average the losses in each mini-batch when computing gradients.
loss_reduction = tf.losses.Reduction.SUM_OVER_BATCH_SIZE
# A `Head` instance defines the loss function and metrics for `Estimators`.
head = tf.contrib.estimator.multi_class_head(
NUM_CLASSES, loss_reduction=loss_reduction)
# Some `Estimators` use feature columns for understanding their input features.
feature_columns = [
tf.feature_column.numeric_column(FEATURES_KEY, shape=[28, 28, 1])
]
# Estimator configuration.
config = tf.estimator.RunConfig(
save_checkpoints_steps=50000,
save_summary_steps=50000,
tf_random_seed=RANDOM_SEED)
This is a baseline function that uses a linear classifier:
#@test {"skip": true}
#@title Parameters
LEARNING_RATE = 0.001 #@param {type:"number"}
TRAIN_STEPS = 5000 #@param {type:"integer"}
BATCH_SIZE = 64 #@param {type:"integer"}
estimator = tf.estimator.LinearClassifier(
feature_columns=feature_columns,
n_classes=NUM_CLASSES,
optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
loss_reduction=loss_reduction,
config=config)
results, _ = tf.estimator.train_and_evaluate(
estimator,
train_spec=tf.estimator.TrainSpec(
input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
max_steps=TRAIN_STEPS),
eval_spec=tf.estimator.EvalSpec(
input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])
It outputs:
INFO:tensorflow:Loss for final step: 0.135509.
Accuracy: 0.9258
Loss: 0.27285188
Now it is getting interesting: We have to write our own Class to build convolutional neural networks as subnetworks. We have to pre-define keras conv layers. In this case it is a simple, single Conv2D layer that is followed by MaxPooling, gets flattened and is followed by a final dense layer:
class SimpleCNNBuilder(adanet.subnetwork.Builder):
"""Builds a CNN subnetwork for AdaNet."""
def __init__(self, learning_rate, max_iteration_steps, seed):
"""Initializes a `SimpleCNNBuilder`.
Args:
learning_rate: The float learning rate to use.
max_iteration_steps: The number of steps per iteration.
seed: The random seed.
Returns:
An instance of `SimpleCNNBuilder`.
"""
self._learning_rate = learning_rate
self._max_iteration_steps = max_iteration_steps
self._seed = seed
def build_subnetwork(self,
features,
logits_dimension,
training,
iteration_step,
summary,
previous_ensemble=None):
"""See `adanet.subnetwork.Builder`."""
images = list(features.values())[0]
kernel_initializer = tf.keras.initializers.he_normal(seed=self._seed)
x = tf.keras.layers.Conv2D(
filters=16,
kernel_size=3,
padding="same",
activation="relu",
kernel_initializer=kernel_initializer)(
images)
x = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(
units=64, activation="relu", kernel_initializer=kernel_initializer)(
x)
# The `Head` passed to adanet.Estimator will apply the softmax activation.
logits = tf.keras.layers.Dense(
units=10, activation=None, kernel_initializer=kernel_initializer)(
x)
# Use a constant complexity measure, since all subnetworks have the same
# architecture and hyperparameters.
complexity = tf.constant(1)
return adanet.Subnetwork(
last_layer=x,
logits=logits,
complexity=complexity,
persisted_tensors={})
def build_subnetwork_train_op(self,
subnetwork,
loss,
var_list,
labels,
iteration_step,
summary,
previous_ensemble=None):
"""See `adanet.subnetwork.Builder`."""
# Momentum optimizer with cosine learning rate decay works well with CNNs.
learning_rate = tf.train.cosine_decay(
learning_rate=self._learning_rate,
global_step=iteration_step,
decay_steps=self._max_iteration_steps)
optimizer = tf.train.MomentumOptimizer(learning_rate, .9)
# NOTE: The `adanet.Estimator` increments the global step.
return optimizer.minimize(loss=loss, var_list=var_list)
def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
iteration_step, summary):
"""See `adanet.subnetwork.Builder`."""
return tf.no_op("mixture_weights_train_op")
@property
def name(self):
"""See `adanet.subnetwork.Builder`."""
return "simple_cnn"
Furthermore, we need a second function to build our NN with convolutional layers. We have to write a generator function that generates subnets using the builder function we defined above.
class SimpleCNNGenerator(adanet.subnetwork.Generator):
"""Generates a `SimpleCNN` at each iteration.
"""
def __init__(self, learning_rate, max_iteration_steps, seed=None):
"""Initializes a `Generator` that builds `SimpleCNNs`.
Args:
learning_rate: The float learning rate to use.
max_iteration_steps: The number of steps per iteration.
seed: The random seed.
Returns:
An instance of `Generator`.
"""
self._seed = seed
self._dnn_builder_fn = functools.partial(
SimpleCNNBuilder,
learning_rate=learning_rate,
max_iteration_steps=max_iteration_steps)
def generate_candidates(self, previous_ensemble, iteration_number,
previous_ensemble_reports, all_reports):
"""See `adanet.subnetwork.Generator`."""
seed = self._seed
# Change the seed according to the iteration so that each subnetwork
# learns something different.
if seed is not None:
seed += iteration_number
return [self._dnn_builder_fn(seed=seed)]
Now, we can prepare a set of hyperparameters and run the model:
#@title Parameters
LEARNING_RATE = 0.05 #@param {type:"number"}
TRAIN_STEPS = 5000 #@param {type:"integer"}
BATCH_SIZE = 64 #@param {type:"integer"}
ADANET_ITERATIONS = 2 #@param {type:"integer"}
max_iteration_steps = TRAIN_STEPS // ADANET_ITERATIONS
estimator = adanet.Estimator(
head=head,
subnetwork_generator=SimpleCNNGenerator(
learning_rate=LEARNING_RATE,
max_iteration_steps=max_iteration_steps,
seed=RANDOM_SEED),
max_iteration_steps=max_iteration_steps,
evaluator=adanet.Evaluator(
input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE),
steps=None),
adanet_loss_decay=.99,
config=config)
results, _ = tf.estimator.train_and_evaluate(
estimator,
train_spec=tf.estimator.TrainSpec(
input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
max_steps=TRAIN_STEPS),
eval_spec=tf.estimator.EvalSpec(
input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])
INFO:tensorflow:Loss for final step: 0.023453332.
Accuracy: 0.9875
Loss: 0.039442677
Comparison to auto-keras
Auto-Keras achieves an accuracy of 0.9879 after 30 minutes of architecture search on the (identical) MNIST dataset. AdaNet leads to an acc of 0.9875 after roughly 15 mins (I did not time it).