H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art (2023) in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.
The following animations illustrate the contrast between the feature maps of classical convolutional networks (CNN) and the stability of the subsequent feature maps of H-NeXt under rotation of an input.
The example feature maps of CNNs (left) and H-NeXt (right).
H-NeXt Architecture
The architecture includes an equivariant backbone, where the output feature values remain consistent under rotation or translation of the input, albeit with altered position and orientation of the output feature maps. This is followed by an invariant pooling mechanism, which is unaffected by rotation and translation and consistently returns identical features regardless of the transformation of the input. The architecture concludes with the classic Multi-Layer Perceptron (MLP) for classification.
Overall architecture of H-NeXt.
Benchmarks
The following two benchmarks (based on MNIST, CIFAR-10) were created for testing and can be found as mnist-rot-test, cifar-rot-test. Both packed in Pytorch loader. Following table shows the comparison taken from our paper.
Comparison of MNIST with unrotated train and rotated test set. OA stands for overall accuracy, averaged over all angles.
Comparison of CIFAR-10 with unrotated train and rotated test set. OA stands for overall accuracy, averaged over all angles.
References
2023
H-NeXt: The next step towards roto-translation invariant networks
Tomáš Karella, Filip Šroubek, Jan Blažek, Jan Flusser, and Václav Košík
In 34th British Machine Vision Conference 2023, BMVC 2023, Aberdeen, UK, November 20-24, 2023, 2023
The widespread popularity of equivariant networks underscores the significance of parameter efficient models and effective use of training data. At a time when robustness to unseen deformations is becoming increasingly important, we present H-NeXt, which bridges the gap between equivariance and invariance. H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.