Sunteți pe pagina 1din 2

DTCWT-PCA

Amarjot Singh

Signal Processing Group, Department of Engineering, University of Cambridge, U.K.

ABSTRACT lost due to smooting, cascaded wavelet transformations are


In this paper we introduce an unsupervised deep network used in later layers to recover them. This justifies the need
that uses a parametric log transformation with Dual-Tree of a multi-layered network. To de-correlate the multiplicative
complex weights followed by two stages of PCA layers. The low-frequency components from the concatenated invariant
network is used for extracting succint translational invari- features, a log transformation may be applied at all layers [?].
ant representations of multi-resolutional images for image After this, orthogonal least squares (OLS) selects the subset
segmentation and classification tasks. The parametric trans- of object class-specific dimensions across the training data,
formation aids the OLS pruning algorithm by converting the similar to that of the fully connected layers in CNNs [?]. To
skewed distributions into relatively mean-symmetric distri- suppress the effect of outliers, whose presence may hinder
butions while the Dual-Tree wavelets improve the compu- least square parameter estimations due to unwanted features
tational efficiency of the network. The stacked PCA layers extracted from background clutter, noise and illumination, ap-
help in extracting the top-most eigenvectors of the features, proximate symmetry is introduced. Overall, ScatterNets have
thus retaining maximum invariance and providing concise been shown to provide results which are comparable to deep
representations in an unsupervised setting. The proposed net- neural networks like CNNs.
Amarjot et.al. [] previously proposed an improved Scat-
work has been tested on benchmark datasets of varying sizes,
terNet that used dual-tree complex wavelet transform (DTCWT) [?]
providing enhanced performance on limited datasets with few
parametric log transformation layers and OLS layer to ex-
training instances.
tract relatively symmetric translation invariant representa-
Index Terms— DTCWT, Scattering network, Convolu- tions from a multi-resolution image. In this paper, we further
tional neural network, PCA, unsupervised, CIFAR. improve this model by introducing multi-staged PCA lay-
ers. These PCA layers provide a linear transformation to the
1. INTRODUCTION features and help reduce the dimensionality while retaining
the main variance of the representations. The transformed
The task of identifying and classifying objects in an image is features are finally used by a Gaussian-kernel support vec-
considered as a difficult problem. This is due to the presence tor machine (G-SVM) to perform object classification and
of translation, rotation and scale variability of objects within segmentation on the following datasets - PASCAL Object
the image and external variabilities such as noise and illumi- Recognition Database, Caltech 101, Spine Image segmenta-
nation. Initially, hand-engineered features such as SIFT [?] tion [], CIFAR-10.
and HOG [?] which modeled the geometric properties of the The contributions of the paper are as follows:
objects were used for the classification. But later, they got re- • Filter Bank: Dual-tree wavelets are used as opposed to
placed by trained neural networks [?], [?], [?], especially, Morlet wavelets. The DT-wavelets have discrete form,
Convolutional Neural Networks (CNNs) [?] . CNNs were short support, perfect reconstruction, and limited re-
able to achieve state-of-the art results by learning discrimina- dundancy, thus providing similar rich features to Morlet
tive class-specific image representations which included the wavelets
variability of the objects mentioned above. However, given
• Multilayer PCA-net: The two stacked layers of PCA
the overwhelming complexity of these networks and amount
which essentially amount to a linear transformation of
of trainable parameters, design of optimal configurations is
the previous features help to capture invariance in the
still a difficult task.
ScatterNets, as shown by Mallat [?], [?], [?], [?], incorpo- data thus providing concise and richer semantic infor-
rate the geometric knowledge of images to produce discrim- mation of the images. This transformation has the ad-
inative and translational, rotational invariant representations. vantage of being unsupervised and is shown to perform
In this network, the images are first filtered with multi-scale efficiently in datasets of significantly smaller sizes.
and multi-directional complex Morlet wavelets. This provides • Unsupervised architecture: The proposed model func-
the invariant features which is followed by point-wise nonlin- tions in an unsupervised fashion. This gives great lever-
earity and local smoothing. Since, the high frequencies are age to operate on abundant unlabelled data for numer-
ous tasks like image segmentation and classification.
The paper is divided into the following sections:

S-ar putea să vă placă și