CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing 360°. We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.
2023
H-NeXt: The next step towards roto-translation invariant networks
Tomáš Karella, Filip Šroubek, Jan Blažek, Jan Flusser, and Václav Košík
In 34th British Machine Vision Conference 2023, BMVC 2023, Aberdeen, UK, November 20-24, 2023, 2023
The widespread popularity of equivariant networks underscores the significance of parameter efficient models and effective use of training data. At a time when robustness to unseen deformations is becoming increasingly important, we present H-NeXt, which bridges the gap between equivariance and invariance. H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.
IPTA23
CNN Ensemble Robust to Rotation Using Radon Transform
Václav Košík, Tomáš Karella, and Jan Flusser
Proceedings of The 12th International Conference on Image Processing Theory, Tools and Applications (IPTA 2023), 2023
A great deal of attention has been paid to alternative techniques to data augmentation in the literature. Their goal is to make convolutional neural networks (CNNs) invariant or at least robust to various transformations. In this paper, we present an ensemble model combining a classic CNN with an invariant CNN\nwhere both were trained without any augmentation. The goal is to preserve the performance of the classic CNN on nondeformed images (where it is supposed to classify more accurately) and the performance of the invariant CNN on deformed images (where it is the other way around). The combination is controlled by another network which outputs a coefficient that determines the fusion rule of the two networks. The auxiliary network is trained to output the coefficient depending on the intensity of the image deformation. In the experiments, we focus on rotation as a simple and most frequently studied case of transformation. In addition, we present a network invariant to rotation that is fed with the Radon transform of the input images. The performance of this network is tested on rotated MNIST and is further used in the ensemble whose performance is demonstrated on the CIFAR10- dataset.
3D Non-separable Moment Invariants
Jan Flusser, Tomáš Suk, Leonid Bedratyuk, and Tomáš Karella
In this paper, we introduce new 3D rotation moment invariants, which are composed of non-separable Appell moments. The Appell moments can be substituted directly into the 3D rotation invariants instead of the geometric moments without violating their invariance. We show that non-separable moments may outperform the separable ones in terms of recognition power and robustness thanks to a better distribution of their zero surfaces over the image space. We test the numerical properties and discrimination power of the proposed invariants on three real datasets – MRI images of human brain, 3D scans of statues, and confocal microscope images of worms.
2022
Convolutional neural network exploiting pixel surroundings to reveal hidden features in artwork NIR reflectograms
Near-infrared reflectography (NIR) is a well-established non-invasive and non-contact imaging technique. The NIR methods are able to reveal concealed layers of artwork, such as a painter’s sketch or repainted canvas. The information obtained may be helpful to historians for studying artist technique, attributing an artwork reconstructing faded details. Our research presents the improved method previously developed that reveals the hidden features by removing the information content of the visible spectrum from NIR. Based on convolutional neural networks (CNN), our model estimates the transfer function from visible spectra to NIR, which is nonlinear and specific for painting materials. Its parameters are learnt for particular paintings on the subsamples randomly selected across the canvas, and the model is further utilised to enhance the whole artwork. In addition to the previously developed model, our algorithm exploits each pixel’s surroundings to estimate its NIR response. This leads to more precise results and increased robustness to various noises. We demonstrate higher accuracy than the previous method on the historical paintings mock-ups and higher performance on well-known artworks such as Madonna dei Fusi attributed to Leonardo da Vinci.