Contents



Introduction

Google published AdaNet recently. It looks a bit like a combination of auto-keras and auto-sklearn (neural architecture search + meta learning) . My experiences with frameworks for automated machine learning were mostly disappointing compared to “simple best practices”. I want to revisit auto-keras once it is released in version 0.3 - until then, let’s see if AdaNet is of more use outside the set of standard benchmark tasks. AdaNet is based on the paper AdaNet: Adaptive Structural Learning of Artificial Neural Networks by Coretes et al (2017) proposing an algorithm that learns neural network structures and weights adaptively.

Installation

There is a pip package available meaning that (env) $ pip install adanet should be sufficient. However, I’m not so sure if it is trustworthy since the readme points out how to build it ourselves. AdaNet requires Bazel which is a build tool similar to make. If we can install using pre-compiled packages (Ubuntu, Fedora,RHEL,CentOS, macOS, Windows) or compile it ourselves (my prefered method). Further, it requires a TensorFlow version >= 1.7.0. Next, we have to download it and compile it:

(env) $ git clone https://github.com/tensorflow/adanet
(env) $ cd adanet/adanet

# test if bazel works correctly
(env) $ bazel test -c opt //..

# building the package
(env) bazel build //pip_package:build_pip_package
(env) pip install bazel-bin/adanet/pip_package/build_pip_package/*.whl

If everything worked out correctly, then we can import adanet in Python:

import adanet

Structure of AdaNet

  • adanet.core
    • core seems to be an abstraction layer/class that contains all the classes and functions below.
  • adanet.absolute_import
    • absolute_import is neither documented nor did any part of the source code on github describes it. It is imported from __future__ therefore it is a part of python itself.
  • adanet.division
    • division is imported from __future__ as well and therefore it is a part of python itself.
  • adanet.print_function
    • print_function is imported from __future__ as well and therefore it is a part of python itself.
  • adanet.subnetwork
    • subnetwork loads builder/generator functions and the Subnetwork class as well as a Report class containing all hyperparameters, attributes and metrics.
  • adanet.Ensemble
    • Ensemble is a class that represents a collection of subnetworks that form a neural network by using a weighted sum on their outputs.
  • adanet.Estimator
    • Estimator is the class that implements the AdaNet algorithms propose in the original paper.
  • adanetEvaluator
    • Evaluator evaluates the network by computing losses for different steps and batches.
  • adanet.MixtureWeightType
    • MixtureWeightType povides weights of types scalar, vector and matrix.
  • adanet.ReportMaterializer
    • ReportMaterializer stores values for internal documentation of the process.
  • adanet.Subnetwork
    • Subnetwork stores a single subnetwork from an ensemble.
  • adanet.Summary
    • Summary is an interface to TensorBoard.
  • adanet.WeightedSubnetwork
    • WeightedSubnetwork contains the weights for ensembling subnetworks


classes_adanet



0

Builder


build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_report()
build_subnetwork_train_op()
name()
prune_previous_ensemble()



1

CandidateBuilderTest


test_build_candidate()
test_init_errors()



2

CandidateTest


test_new()
test_new_errors()



3

CountDownTimerTest


test_secs_remaining_long()
test_secs_remaining_short()
test_secs_remaining_zero()



4

Ensemble


 



5

EnsembleBuilderMetricFnTest


test_all_args_are_optional()
test_all_supported_args()
test_all_supported_args_in_different_order()
test_overrides_existing_metrics()
test_should_add_metrics()
test_should_error_out_for_not_recognized_args()



6

EnsembleBuilderTest

test_subdirectory

setUp()
tearDown()
test_append_new_subnetwork()
test_init_error()



7

EnsembleFreezerTest

maxDiff : NoneType
test_subdirectory

setUp()
test_freeze_ensemble()
test_freeze_ensemble_error()
test_load_frozen_ensemble()
test_load_frozen_ensemble_colocation_bug()
test_wrapped_features_none()
test_wrapped_features_placeholder()
test_wrapped_features_sparse_placeholder()
test_wrapped_features_sparse_tensors()
test_wrapped_features_tensors()



8

Estimator


evaluate()
train()



9

EstimatorCallingModelFnDirectlyTest


test_calling_model_fn_directly()



20

EstimatorTestCase

test_subdirectory

setUp()
tearDown()



9->20





10

EstimatorCheckpointTest


test_checkpoints()



10->20





11

EstimatorDifferentFeaturesPerModeTest


test_different_features_per_mode()



11->20





12

EstimatorExportSavedModelForEvalTest


test_export_saved_model_for_eval()



12->20





13

EstimatorExportSavedModelForPredictTest


test_export_saved_model_for_predict()



13->20





14

EstimatorForceGrowTest


test_force_grow()



14->20





15

EstimatorKerasLayersTest


test_lifecycle()



15->20





16

EstimatorMembersOverrideTest


test_assert_members_are_not_overridden()



16->20





17

EstimatorReportTest


compare_report_lists()
test_report_generation_and_usage()



17->20





18

EstimatorSummaryWriterTest


test_eval_metrics()
test_summaries()



18->20





19

EstimatorTest


test_categorical_columns()
test_lifecycle()
test_train_error()



19->20





21

Evaluator

input_fn
steps

evaluate_adanet_losses()



22

EvaluatorTest


test_adanet_losses()



23

ExportOutputKeys

CLASSIFICATION_CLASSES : str
CLASSIFICATION_SCORES : str
INVALID : str
PREDICTION : str
REGRESSION : str

 



24

ExportOutputsTest


test_head_export_outputs()



25

FakePlaceholder

dtype
shape : NoneType

 



26

FakeSparsePlaceholder

dtype
shape : NoneType

 



27

FakeSparseTensor

dense_shape
indices
values

 



28

FakeSubnetwork

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



29

Generator


generate_candidates()



30

InputUtilsTest


test_make_placeholder_input_fn()



31

IterationBuilderTest


test_build_iteration()
test_build_iteration_error()



32

IterationTest


test_new()
test_new_errors()



33

KerasCNNBuilder

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



34

Keys

BIAS : str
COMPLEXITY : str
LAST_LAYER : str
LOGITS : str
NAME : str
PERSISTED_TENSORS : str
PERSISTED_TENSORS_SEPARATOR : str
WEIGHT : str

 



35

MaterializedReport


 



36

MixtureWeightType

MATRIX : str
SCALAR : str
VECTOR : str

 



37

Report


 



38

ReportAccessorTest


test_add_to_empty_file()
test_add_to_existing_file()
test_read_from_empty_file()
test_value_error()
test_write_iteration_report_encoding()



39

ReportMaterializer

input_fn
steps

materialize_subnetwork_reports()



40

ReportMaterializerTest


test_materialize_subnetwork_reports()



41

ReportTest


test_drop_non_scalar_metric()
test_new()
test_new_errors()



42

ScopedSummaryTest


test_audio_summary()
test_audio_summary_with_family()
test_histogram_summary()
test_histogram_summary_with_family()
test_image_summary()
test_image_summary_with_family()
test_merge_all()
test_scalar_summary()
test_scalar_summary_with_family()
test_scope()
test_summarizing_variable()
test_summary_name_conversion()



43

SimpleGenerator


generate_candidates()



43->29





44

Subnetwork


 



45

SubnetworkTest


test_new()
test_new_errors()
test_prune_previous_ensemble()



46

Summary


audio()
histogram()
image()
scalar()



47

WeightedSubnetwork


 



48

_Builder

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



49

_BuilderPrunerAll


prune_previous_ensemble()



49->48





50

_BuilderPrunerLeaveOne


prune_previous_ensemble()



50->48





51

_Candidate


 



52

_CandidateBuilder


build_candidate()



53

_CountDownTimer


secs_remaining()



54

_DNNBuilder

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_report()
build_subnetwork_train_op()



55

_DNNBuilder

name
seed

build_subnetwork()
train_mixture_weights()
train_subnetwork()



56

_EnsembleBuilder


append_new_subnetwork()
build_ensemble_spec()



57

_EnsembleFreezer


freeze_ensemble()
load_frozen_ensemble()
wrapped_features()



58

_EnsembleSpec


 



59

_EvalMetricSaverHook


before_run()
end()



60

_EvalMetricsHead

logits_dimension

create_estimator_spec()



61

_FakeBuilder

name
seed

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



62

_FakeCandidateBuilder


build_candidate()



63

_FakeEnsembleBuilder


append_new_subnetwork()



64

_FakeGenerator


generate_candidates()



65

_FakeMetric


to_metric()



66

_FakeSummary


current_scope()
scalar()



67

_FakeSummary


audio()
current_scope()
histogram()
image()
scalar()



68

_HeadEnsembleBuilder


append_new_subnetwork()



69

_Iteration


 



70

_IterationBuilder


build_iteration()



71

_Keys

CURRENT_ITERATION : str
EVALUATE_ENSEMBLES : str
FREEZE_ENSEMBLE : str
FROZEN_ENSEMBLE_NAME : str
INCREMENT_ITERATION : str
MATERIALIZE_REPORT : str
SUBNETWORK_GENERATOR : str

 



72

_LinearBuilder

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



73

_ModifierSessionRunHook


begin()
end()



74

_ReportAccessor


read_iteration_reports()
write_iteration_report()



75

_ScopedSummary

scope

audio()
current_scope()
histogram()
image()
merge_all()
scalar()



75->46





76

_SimpleBuilder

name

build_mixture_weights_train_op()
build_subnetwork()
build_subnetwork_train_op()



77

_StopAfterTrainingHook


after_run()
before_run()



78

_WidthLimitingDNNBuilder


prune_previous_ensemble()



78->54





First Steps - MNIST Toy Example

Google provides tutorials for Boston Housing Prices and Fashion-MNIST. Both tutorials are licensed under an Apache 2.0 license. Since I tested autokeras for MNIST, I’m going to start with MNIST as well. The code from the Fashion-MNIST tutorial is used here, however I’m commenting it differently.

It seems like we have to load a few functions from __future__, although adanet loads a few functions from __future__:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

AdaNet requires functtools, hence we haave to load it:

import functools

Now, we can load adanet and tensorflow:

import adanet
import tensorflow as tf

# set random seed to make results reproducible
RANDOM_SEED = 42

We can load the MNIST dataset using keras for our convenience:

(x_train, y_train), (x_test, y_test) = (tf.keras.datasets.mnist.load_data())

Now comes something completely different than I expected. From my limited understanding of adanet (so far), we have to write generator functions and the search space ourselves. From an initial description I expected something closer to Google AutoML or auto-keras.

First, we have write our generator function:

FEATURES_KEY = "images"


def generator(images, labels):
  """Returns a generator that returns image-label pairs."""

  def _gen():
    for image, label in zip(images, labels):
      yield image, label

  return _gen

Next, we have to write our own image preprocessing function (this is less convenient than using Keras or PyTorch):

def preprocess_image(image, label):
  """Preprocesses an image for an `Estimator`."""
  # First let's scale the pixel values to be between 0 and 1.
  image = image / 255.
  # Next we reshape the image so that we can apply a 2D convolution to it.
  image = tf.reshape(image, [28, 28, 1])
  # Finally the features need to be supplied as a dictionary.
  features = {FEATURES_KEY: image}
  return features, label

Now, we have to write our own input function that generates each batch:

def input_fn(partition, training, batch_size):
  """Generate an input_fn for the Estimator."""

  def _input_fn():
    if partition == "train":
      dataset = tf.data.Dataset.from_generator(
          generator(x_train, y_train), (tf.float32, tf.int32), ((28, 28), ()))
    else:
      dataset = tf.data.Dataset.from_generator(
          generator(x_test, y_test), (tf.float32, tf.int32), ((28, 28), ()))

    # We call repeat after shuffling, rather than before, to prevent separate
    # epochs from blending together.
    if training:
      dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

    dataset = dataset.map(preprocess_image).batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels

  return _input_fn

Next, we will have to define some basic properties of our NN training process:

# The number of classes.
NUM_CLASSES = 10

# We will average the losses in each mini-batch when computing gradients.
loss_reduction = tf.losses.Reduction.SUM_OVER_BATCH_SIZE

# A `Head` instance defines the loss function and metrics for `Estimators`.
head = tf.contrib.estimator.multi_class_head(
    NUM_CLASSES, loss_reduction=loss_reduction)

# Some `Estimators` use feature columns for understanding their input features.
feature_columns = [
    tf.feature_column.numeric_column(FEATURES_KEY, shape=[28, 28, 1])
]

# Estimator configuration.
config = tf.estimator.RunConfig(
    save_checkpoints_steps=50000,
    save_summary_steps=50000,
    tf_random_seed=RANDOM_SEED)

This is a baseline function that uses a linear classifier:

#@test {"skip": true}
#@title Parameters
LEARNING_RATE = 0.001  #@param {type:"number"}
TRAIN_STEPS = 5000  #@param {type:"integer"}
BATCH_SIZE = 64  #@param {type:"integer"}

estimator = tf.estimator.LinearClassifier(
    feature_columns=feature_columns,
    n_classes=NUM_CLASSES,
    optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
    loss_reduction=loss_reduction,
    config=config)

results, _ = tf.estimator.train_and_evaluate(
    estimator,
    train_spec=tf.estimator.TrainSpec(
        input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
        max_steps=TRAIN_STEPS),
    eval_spec=tf.estimator.EvalSpec(
        input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
        steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])

It outputs:

INFO:tensorflow:Loss for final step: 0.135509.
Accuracy: 0.9258
Loss: 0.27285188

Now it is getting interesting: We have to write our own Class to build convolutional neural networks as subnetworks. We have to pre-define keras conv layers. In this case it is a simple, single Conv2D layer that is followed by MaxPooling, gets flattened and is followed by a final dense layer:

class SimpleCNNBuilder(adanet.subnetwork.Builder):
  """Builds a CNN subnetwork for AdaNet."""

  def __init__(self, learning_rate, max_iteration_steps, seed):
    """Initializes a `SimpleCNNBuilder`.

    Args:
      learning_rate: The float learning rate to use.
      max_iteration_steps: The number of steps per iteration.
      seed: The random seed.

    Returns:
      An instance of `SimpleCNNBuilder`.
    """
    self._learning_rate = learning_rate
    self._max_iteration_steps = max_iteration_steps
    self._seed = seed

  def build_subnetwork(self,
                       features,
                       logits_dimension,
                       training,
                       iteration_step,
                       summary,
                       previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""
    images = list(features.values())[0]
    kernel_initializer = tf.keras.initializers.he_normal(seed=self._seed)
    x = tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=3,
        padding="same",
        activation="relu",
        kernel_initializer=kernel_initializer)(
            images)
    x = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(
        units=64, activation="relu", kernel_initializer=kernel_initializer)(
            x)

    # The `Head` passed to adanet.Estimator will apply the softmax activation.
    logits = tf.keras.layers.Dense(
        units=10, activation=None, kernel_initializer=kernel_initializer)(
            x)

    # Use a constant complexity measure, since all subnetworks have the same
    # architecture and hyperparameters.
    complexity = tf.constant(1)

    return adanet.Subnetwork(
        last_layer=x,
        logits=logits,
        complexity=complexity,
        persisted_tensors={})

  def build_subnetwork_train_op(self,
                                subnetwork,
                                loss,
                                var_list,
                                labels,
                                iteration_step,
                                summary,
                                previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""

    # Momentum optimizer with cosine learning rate decay works well with CNNs.
    learning_rate = tf.train.cosine_decay(
        learning_rate=self._learning_rate,
        global_step=iteration_step,
        decay_steps=self._max_iteration_steps)
    optimizer = tf.train.MomentumOptimizer(learning_rate, .9)
    # NOTE: The `adanet.Estimator` increments the global step.
    return optimizer.minimize(loss=loss, var_list=var_list)

  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
                                     iteration_step, summary):
    """See `adanet.subnetwork.Builder`."""
    return tf.no_op("mixture_weights_train_op")

  @property
  def name(self):
    """See `adanet.subnetwork.Builder`."""
    return "simple_cnn"

Furthermore, we need a second function to build our NN with convolutional layers. We have to write a generator function that generates subnets using the builder function we defined above.

class SimpleCNNGenerator(adanet.subnetwork.Generator):
  """Generates a `SimpleCNN` at each iteration.
  """

  def __init__(self, learning_rate, max_iteration_steps, seed=None):
    """Initializes a `Generator` that builds `SimpleCNNs`.

    Args:
      learning_rate: The float learning rate to use.
      max_iteration_steps: The number of steps per iteration.
      seed: The random seed.

    Returns:
      An instance of `Generator`.
    """
    self._seed = seed
    self._dnn_builder_fn = functools.partial(
        SimpleCNNBuilder,
        learning_rate=learning_rate,
        max_iteration_steps=max_iteration_steps)

  def generate_candidates(self, previous_ensemble, iteration_number,
                          previous_ensemble_reports, all_reports):
    """See `adanet.subnetwork.Generator`."""
    seed = self._seed
    # Change the seed according to the iteration so that each subnetwork
    # learns something different.
    if seed is not None:
      seed += iteration_number
    return [self._dnn_builder_fn(seed=seed)]

Now, we can prepare a set of hyperparameters and run the model:

#@title Parameters
LEARNING_RATE = 0.05  #@param {type:"number"}
TRAIN_STEPS = 5000  #@param {type:"integer"}
BATCH_SIZE = 64  #@param {type:"integer"}
ADANET_ITERATIONS = 2  #@param {type:"integer"}

max_iteration_steps = TRAIN_STEPS // ADANET_ITERATIONS
estimator = adanet.Estimator(
    head=head,
    subnetwork_generator=SimpleCNNGenerator(
        learning_rate=LEARNING_RATE,
        max_iteration_steps=max_iteration_steps,
        seed=RANDOM_SEED),
    max_iteration_steps=max_iteration_steps,
    evaluator=adanet.Evaluator(
        input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE),
        steps=None),
    adanet_loss_decay=.99,
    config=config)

results, _ = tf.estimator.train_and_evaluate(
    estimator,
    train_spec=tf.estimator.TrainSpec(
        input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
        max_steps=TRAIN_STEPS),
    eval_spec=tf.estimator.EvalSpec(
        input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
        steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])
INFO:tensorflow:Loss for final step: 0.023453332.
Accuracy: 0.9875
Loss: 0.039442677

Comparison to auto-keras

Auto-Keras achieves an accuracy of 0.9879 after 30 minutes of architecture search on the (identical) MNIST dataset. AdaNet leads to an acc of 0.9875 after roughly 15 mins (I did not time it).