H-NeXt | Tomáš Karella

H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art (2023) in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.

More details can be found in our paper or in github repository.

Feature maps

The following animations illustrate the contrast between the feature maps of classical convolutional networks (CNN) and the stability of the subsequent feature maps of H-NeXt under rotation of an input.

The example feature maps of CNNs (left) and H-NeXt (right).

H-NeXt Architecture

The architecture includes an equivariant backbone, where the output feature values remain consistent under rotation or translation of the input, albeit with altered position and orientation of the output feature maps. This is followed by an invariant pooling mechanism, which is unaffected by rotation and translation and consistently returns identical features regardless of the transformation of the input. The architecture concludes with the classic Multi-Layer Perceptron (MLP) for classification.

Overall architecture of H-NeXt.

Benchmarks

The following two benchmarks (based on MNIST, CIFAR-10) were created for testing and can be found as mnist-rot-test, cifar-rot-test. Both packed in Pytorch loader. Following table shows the comparison taken from our paper.

Comparison of MNIST with unrotated train and rotated test set. OA stands for overall accuracy, averaged over all angles.

Comparison of CIFAR-10 with unrotated train and rotated test set. OA stands for overall accuracy, averaged over all angles.

The widespread popularity of equivariant networks underscores the significance of parameter efficient models and effective use of training data. At a time when robustness to unseen deformations is becoming increasingly important, we present H-NeXt, which bridges the gap between equivariance and invariance. H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.

Feature maps

H-NeXt Architecture

Benchmarks

References

2023