Introduction to Google AdaNet

Contents

Introduction
Installation
Structure of AdaNet
First Steps - MNIST Toy Example

Introduction

Google published AdaNet recently. It looks a bit like a combination of auto-keras and auto-sklearn (neural architecture search + meta learning) . My experiences with frameworks for automated machine learning were mostly disappointing compared to “simple best practices”. I want to revisit auto-keras once it is released in version 0.3 - until then, let’s see if AdaNet is of more use outside the set of standard benchmark tasks. AdaNet is based on the paper AdaNet: Adaptive Structural Learning of Artificial Neural Networks by Coretes et al (2017) proposing an algorithm that learns neural network structures and weights adaptively.

Installation

There is a pip package available meaning that (env) $ pip install adanet should be sufficient. However, I’m not so sure if it is trustworthy since the readme points out how to build it ourselves. AdaNet requires Bazel which is a build tool similar to make. If we can install using pre-compiled packages (Ubuntu, Fedora,RHEL,CentOS, macOS, Windows) or compile it ourselves (my prefered method). Further, it requires a TensorFlow version >= 1.7.0. Next, we have to download it and compile it:

(env) $ git clone https://github.com/tensorflow/adanet
(env) $ cd adanet/adanet

# test if bazel works correctly
(env) $ bazel test -c opt //..

# building the package
(env) bazel build //pip_package:build_pip_package
(env) pip install bazel-bin/adanet/pip_package/build_pip_package/*.whl

If everything worked out correctly, then we can import adanet in Python:

import adanet

Structure of AdaNet

adanet.core
- core seems to be an abstraction layer/class that contains all the classes and functions below.
adanet.absolute_import
- absolute_import is neither documented nor did any part of the source code on github describes it. It is imported from __future__ therefore it is a part of python itself.
adanet.division
- division is imported from __future__ as well and therefore it is a part of python itself.
adanet.print_function
- print_function is imported from __future__ as well and therefore it is a part of python itself.
adanet.subnetwork
- subnetwork loads builder/generator functions and the Subnetwork class as well as a Report class containing all hyperparameters, attributes and metrics.
adanet.Ensemble
- Ensemble is a class that represents a collection of subnetworks that form a neural network by using a weighted sum on their outputs.
adanet.Estimator
- Estimator is the class that implements the AdaNet algorithms propose in the original paper.
adanetEvaluator
- Evaluator evaluates the network by computing losses for different steps and batches.
adanet.MixtureWeightType
- MixtureWeightType povides weights of types scalar, vector and matrix.
adanet.ReportMaterializer
- ReportMaterializer stores values for internal documentation of the process.
adanet.Subnetwork
- Subnetwork stores a single subnetwork from an ensemble.
adanet.Summary
- Summary is an interface to TensorBoard.
adanet.WeightedSubnetwork
- WeightedSubnetwork contains the weights for ensembling subnetworks

First Steps - MNIST Toy Example

Google provides tutorials for Boston Housing Prices and Fashion-MNIST. Both tutorials are licensed under an Apache 2.0 license. Since I tested autokeras for MNIST, I’m going to start with MNIST as well. The code from the Fashion-MNIST tutorial is used here, however I’m commenting it differently.

It seems like we have to load a few functions from __future__, although adanet loads a few functions from __future__:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

AdaNet requires functtools, hence we haave to load it:

import functools

Now, we can load adanet and tensorflow:

import adanet
import tensorflow as tf

# set random seed to make results reproducible
RANDOM_SEED = 42

We can load the MNIST dataset using keras for our convenience:

(x_train, y_train), (x_test, y_test) = (tf.keras.datasets.mnist.load_data())

Now comes something completely different than I expected. From my limited understanding of adanet (so far), we have to write generator functions and the search space ourselves. From an initial description I expected something closer to Google AutoML or auto-keras.

First, we have write our generator function:

FEATURES_KEY = "images"


def generator(images, labels):
  """Returns a generator that returns image-label pairs."""

  def _gen():
    for image, label in zip(images, labels):
      yield image, label

  return _gen

Next, we have to write our own image preprocessing function (this is less convenient than using Keras or PyTorch):

def preprocess_image(image, label):
  """Preprocesses an image for an `Estimator`."""
  # First let's scale the pixel values to be between 0 and 1.
  image = image / 255.
  # Next we reshape the image so that we can apply a 2D convolution to it.
  image = tf.reshape(image, [28, 28, 1])
  # Finally the features need to be supplied as a dictionary.
  features = {FEATURES_KEY: image}
  return features, label

Now, we have to write our own input function that generates each batch:

def input_fn(partition, training, batch_size):
  """Generate an input_fn for the Estimator."""

  def _input_fn():
    if partition == "train":
      dataset = tf.data.Dataset.from_generator(
          generator(x_train, y_train), (tf.float32, tf.int32), ((28, 28), ()))
    else:
      dataset = tf.data.Dataset.from_generator(
          generator(x_test, y_test), (tf.float32, tf.int32), ((28, 28), ()))

    # We call repeat after shuffling, rather than before, to prevent separate
    # epochs from blending together.
    if training:
      dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

    dataset = dataset.map(preprocess_image).batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels

  return _input_fn

Next, we will have to define some basic properties of our NN training process:

# The number of classes.
NUM_CLASSES = 10

# We will average the losses in each mini-batch when computing gradients.
loss_reduction = tf.losses.Reduction.SUM_OVER_BATCH_SIZE

# A `Head` instance defines the loss function and metrics for `Estimators`.
head = tf.contrib.estimator.multi_class_head(
    NUM_CLASSES, loss_reduction=loss_reduction)

# Some `Estimators` use feature columns for understanding their input features.
feature_columns = [
    tf.feature_column.numeric_column(FEATURES_KEY, shape=[28, 28, 1])
]

# Estimator configuration.
config = tf.estimator.RunConfig(
    save_checkpoints_steps=50000,
    save_summary_steps=50000,
    tf_random_seed=RANDOM_SEED)

This is a baseline function that uses a linear classifier:

#@test {"skip": true}
#@title Parameters
LEARNING_RATE = 0.001  #@param {type:"number"}
TRAIN_STEPS = 5000  #@param {type:"integer"}
BATCH_SIZE = 64  #@param {type:"integer"}

estimator = tf.estimator.LinearClassifier(
    feature_columns=feature_columns,
    n_classes=NUM_CLASSES,
    optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
    loss_reduction=loss_reduction,
    config=config)

results, _ = tf.estimator.train_and_evaluate(
    estimator,
    train_spec=tf.estimator.TrainSpec(
        input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
        max_steps=TRAIN_STEPS),
    eval_spec=tf.estimator.EvalSpec(
        input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
        steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])

It outputs:

INFO:tensorflow:Loss for final step: 0.135509.
Accuracy: 0.9258
Loss: 0.27285188

Now it is getting interesting: We have to write our own Class to build convolutional neural networks as subnetworks. We have to pre-define keras conv layers. In this case it is a simple, single Conv2D layer that is followed by MaxPooling, gets flattened and is followed by a final dense layer:

class SimpleCNNBuilder(adanet.subnetwork.Builder):
  """Builds a CNN subnetwork for AdaNet."""

  def __init__(self, learning_rate, max_iteration_steps, seed):
    """Initializes a `SimpleCNNBuilder`.

    Args:
      learning_rate: The float learning rate to use.
      max_iteration_steps: The number of steps per iteration.
      seed: The random seed.

    Returns:
      An instance of `SimpleCNNBuilder`.
    """
    self._learning_rate = learning_rate
    self._max_iteration_steps = max_iteration_steps
    self._seed = seed

  def build_subnetwork(self,
                       features,
                       logits_dimension,
                       training,
                       iteration_step,
                       summary,
                       previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""
    images = list(features.values())[0]
    kernel_initializer = tf.keras.initializers.he_normal(seed=self._seed)
    x = tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=3,
        padding="same",
        activation="relu",
        kernel_initializer=kernel_initializer)(
            images)
    x = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(
        units=64, activation="relu", kernel_initializer=kernel_initializer)(
            x)

    # The `Head` passed to adanet.Estimator will apply the softmax activation.
    logits = tf.keras.layers.Dense(
        units=10, activation=None, kernel_initializer=kernel_initializer)(
            x)

    # Use a constant complexity measure, since all subnetworks have the same
    # architecture and hyperparameters.
    complexity = tf.constant(1)

    return adanet.Subnetwork(
        last_layer=x,
        logits=logits,
        complexity=complexity,
        persisted_tensors={})

  def build_subnetwork_train_op(self,
                                subnetwork,
                                loss,
                                var_list,
                                labels,
                                iteration_step,
                                summary,
                                previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""

    # Momentum optimizer with cosine learning rate decay works well with CNNs.
    learning_rate = tf.train.cosine_decay(
        learning_rate=self._learning_rate,
        global_step=iteration_step,
        decay_steps=self._max_iteration_steps)
    optimizer = tf.train.MomentumOptimizer(learning_rate, .9)
    # NOTE: The `adanet.Estimator` increments the global step.
    return optimizer.minimize(loss=loss, var_list=var_list)

  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
                                     iteration_step, summary):
    """See `adanet.subnetwork.Builder`."""
    return tf.no_op("mixture_weights_train_op")

  @property
  def name(self):
    """See `adanet.subnetwork.Builder`."""
    return "simple_cnn"

Furthermore, we need a second function to build our NN with convolutional layers. We have to write a generator function that generates subnets using the builder function we defined above.

class SimpleCNNGenerator(adanet.subnetwork.Generator):
  """Generates a `SimpleCNN` at each iteration.
  """

  def __init__(self, learning_rate, max_iteration_steps, seed=None):
    """Initializes a `Generator` that builds `SimpleCNNs`.

    Args:
      learning_rate: The float learning rate to use.
      max_iteration_steps: The number of steps per iteration.
      seed: The random seed.

    Returns:
      An instance of `Generator`.
    """
    self._seed = seed
    self._dnn_builder_fn = functools.partial(
        SimpleCNNBuilder,
        learning_rate=learning_rate,
        max_iteration_steps=max_iteration_steps)

  def generate_candidates(self, previous_ensemble, iteration_number,
                          previous_ensemble_reports, all_reports):
    """See `adanet.subnetwork.Generator`."""
    seed = self._seed
    # Change the seed according to the iteration so that each subnetwork
    # learns something different.
    if seed is not None:
      seed += iteration_number
    return [self._dnn_builder_fn(seed=seed)]

Now, we can prepare a set of hyperparameters and run the model:

#@title Parameters
LEARNING_RATE = 0.05  #@param {type:"number"}
TRAIN_STEPS = 5000  #@param {type:"integer"}
BATCH_SIZE = 64  #@param {type:"integer"}
ADANET_ITERATIONS = 2  #@param {type:"integer"}

max_iteration_steps = TRAIN_STEPS // ADANET_ITERATIONS
estimator = adanet.Estimator(
    head=head,
    subnetwork_generator=SimpleCNNGenerator(
        learning_rate=LEARNING_RATE,
        max_iteration_steps=max_iteration_steps,
        seed=RANDOM_SEED),
    max_iteration_steps=max_iteration_steps,
    evaluator=adanet.Evaluator(
        input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE),
        steps=None),
    adanet_loss_decay=.99,
    config=config)

results, _ = tf.estimator.train_and_evaluate(
    estimator,
    train_spec=tf.estimator.TrainSpec(
        input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
        max_steps=TRAIN_STEPS),
    eval_spec=tf.estimator.EvalSpec(
        input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
        steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])

INFO:tensorflow:Loss for final step: 0.023453332.
Accuracy: 0.9875
Loss: 0.039442677

Comparison to auto-keras

Auto-Keras achieves an accuracy of 0.9879 after 30 minutes of architecture search on the (identical) MNIST dataset. AdaNet leads to an acc of 0.9875 after roughly 15 mins (I did not time it).