Documente Academic
Documente Profesional
Documente Cultură
Abstract—We study the problem of learning to rank from multiple the learning performance. Specifically, subspace learning
sources. Though multi-view learning and learning to rank have been finds a common space from different input modalities using
studied extensively leading to a wide range of applications, multi-view an optimization criterion. Canonical Correlation Analysis
arXiv:1801.10402v1 [cs.LG] 31 Jan 2018
learning to rank as a synergy of both topics has received little attention. (CCA) [9], [10] is one of the prevailing unsupervised method
The aim of the paper is to propose a composite ranking method while
used to measure a cross-view correlation. By contrast, Multi-
keeping a close correlation with the individual rankings simultaneously.
We propose a multi-objective solution to ranking by capturing the infor-
view Discriminant Analysis (MvDA) [6] is a supervised
mation of the feature mapping from both within each view as well as learning technique seeking the most discriminant features
across views using autoencoder-like networks. Moreover, a novel end- across views by maximizing the between-class scatter while
to-end solution is introduced to enhance the joint ranking with minimum minimizing the within-class scatter in the underlying feature
view-specific ranking loss, so that we can achieve the maximum global space. Furthermore, a generalized multi-view embedding
view agreements within a single optimization process. The proposed method [5] was proposed using a graph embedding frame-
method is validated on a wide variety of ranking problems, including uni- work for numerous unsupervised and supervised learning
versity ranking, multi-view lingual text ranking and image data ranking, techniques with extension to nonlinear transforms includ-
providing superior results.
ing (approximate) kernel mappings [11], [12] and neural
networks [5], [13]. A nonparametric version of [5] was also
Index Terms—Learning to rank, multi-view data analysis, ranking proposed in [14]. On the other hand, co-training [7] was
introduced to maximize the mutual agreement between
two distinct views, and can be easily extended to multiple
1 I NTRODUCTION inputs by subsequently training over all pairs of views. A
solution to the learning to rank problem was provided by
Learning to rank is an important research topic in infor-
minimizing the pairwise ranking difference using the same
mation retrieval and data mining, which aims to learn a
co-training mechanism [8].
ranking model to produce a query-specfic ranking list. The
ranking model establishes a relationship between each pair Although there are several applications that could ben-
of data samples by combining the corresponding features efit from multi-view learning to rank approach, the topic
in an optimal way [1]. A score is then assigned to each has still been insufficiently studied up to date [15]. Ranking
pair to evaluate its relevance forming a global ranking list of multi-facet objects is generally performed using com-
across all pairs. The success of learning to rank solutions posite indicators. The usefulness of a composite indicator
has brought a wide spectrum of applications, including depends upon the selected functional form and the weights
online advertising [2], natural language processing [3] and associated with the component facets. Existing solutions for
multimedia retrieval [4]. university ranking are an example of using the subjective
weights in the method of composite indicators. However,
Learning appropriate data representation and a suitable
the functional form and its assigned weights are difficult to
scoring function are two vital steps in the ranking problem.
define. Consequently, there is a high disparity in the eval-
Traditionally, a feature mapping models the data distribu-
uation metric between agencies, and the produced ranking
tion in a latent space to match the relevance relationship,
lists usually cause dissension in academic institutes. How-
while the scoring function is used to quantify the relevance
ever, one observation is that, the indicators from different
measure [1]; however, the ranking problem in the real world
agencies may partially overlap and have a high correlation
emerges from multiple facets and data patterns are mined
between each other. We present an example in Fig. 1 to show
from diverse domains. For example, universities are posi-
that, several attributes in the THE dataset [16], including
tioned differently based on numerous factors and weights
teaching, research, student staff ratio and student number
used for quality evaluation by different ranking agencies.
are highly correlated with all of the attributes in the ARWU
Therefore, a global agreement across sources and domains
dataset [17]. Therefore, the motivation of this paper is to find
should be achieved while still maintaining a high ranking
a composite ranking by exploiting the correlation between
performance.
individual rankings.
Multi-view learning has received a wide attention with
Earlier success in multi-view subspace learning pro-
a special focus on subspace learning [5], [6] and co-training
vides a promising way for composite ranking. Concatenat-
[7], and few attempts have been made in ranking problems
ing multiple views into a single input overlooks possible
[8]. It introduces a new paradigm to jointly model and
view discrepancy and does not fully exploit their mutual
combine information encoded in multiple views to enhance
2
2 R ELATED W ORK
2.1 Learning to rank
Learning to rank aims to optimize the combination of data
representation for ranking problems [18]. It has been widely
used in a number of applications, including image retrieval
and ranking [4], [19], image quality ratings [20], online
advertising [2], and text summarization [8]. Solutions to
Fig. 1: The correlation matrix between the measurements of this problem can be decomposed into several key compo-
Times Higher Education (THE) and Academic Ranking of nents, including the input feature, the output vector and the
World Universities (ARWU) rankings. The data is extracted scoring function. The framework is developed by training
and aligned based on the performance of the common the scoring function from the input feature to the output
universities in 2015 between the two ranking agencies. The ranking list, and then, scoring the ranking of new data.
reddish color indicates high correlation, while the matrix Traditional methods also include engineering the feature
elements with low correlation are represented in bluish using the PageRank model [21], for example, to optimally
colors. combine them for obtaining the output. Later, research was
focused on discriminatively training the scoring function to
improve the ranking outputs. The ranking methods can be
agreement in ranking. Our goal is to study beyond the
organized in three categories for the scoring function: the
direct multi-view subspace learning for ranking. This paper
pointwise approach, the pairwise apporach, and the listwise
offers a multi-objective solution to ranking by capturing
approach.
relevant information of feature mapping from within each
view as well as across views. Moreover, we propose an We consider the pairwise approach in this paper and
end-to-end method to optimize the trade-off between view- review the related methods as follows. A preference net-
specific ranking and a discriminant combination of multi- work is developed in [22] to evaluate the pairwise order
view ranking. To this end, we can improve cross-view between two documents. The network learns the preference
ranking performance while maintaining individual ranking function directly to the binary ranking output without using
objectives. an additional scoring function. RankNet [23] defines the
cross-entropy loss and learns a neural network to model the
Intermediate feature representation in the neural net-
ranking. Assuming the scoring function to be linear [24], the
work are exploited in our ranking solutions. Specifically, the
ranking problem can be transformed to a binary classifica-
first contribution is to provide two closely related methods
tion problem, and therefore, many classifiers are available
by adopting an autoencoder-like network. We first train a
to be applied for ranking document pairs. RankBoost [25]
network to learn view-specific feature mappings, and then
adopts Adaboost algorithm [26], which iteratively focuses
maximize their correlation with the intermediate represen-
on the classfication errors between each pair of documents,
tations using either an unsupervised or discriminant pro-
and subsequently, improves the overall output. Ranking
jection to a common latent space. A stochastic optimization
SVM [27] applies SVM to perform pairwise classification.
method is introduced to fit the correlation criterion. Both
GBRank is a ranking method based on Gradient Boost
the autoencoding sub-network per view with a reconstruc-
Tree [28]. Semi-supervised multi-view ranking (SmVR) [8]
tion objective and feedforward sub-networks with a joint
follows the co-training scheme to rank pairs of samples.
correlation-based objective are iteratively optimized in the
Moreover, recent efforts focus on using the evaluation metric
entire network. The projected feature representations in the
to guide the gradient with respect to the ranking pair
common subspace are then combined and used to learn for
during training. These studies include AdaRank [29], which
the ranking function.
optimizes the ranking errors rather than the classification
The second contribution (graphically described in Fig. error in an adaptive way, and LambdaRank [30]. However,
2) is an end-to-end multi-view learning to rank solution. A all of these methods above consider the case of single view
sub-network for each view is trained with its own ranking inputs, while multi-view learning to rank is overlooked.
objective. Then, features from intermediate layers of the sub-
networks are combined after a discriminant mapping to a
2.1.1 Bipartite ranking
common space, and training towards the global ranking
objective. As a result, a network assembly is developed The pairwise approach of the ranking methods serves as
to enhance the joint ranking with mimimum view-specific the basis of our ranking method, and therefore, reviewed
ranking loss, so that we can achieve the maximum view explicitly in this section. Suppose that the training data
3
as the projection matrix of the DMvMDA and is derived by Here, the output of each sub-network Fv is denoted by Zv =
optimizing the objective function Fv (Xv ). Then, we have
V P
V V V
Tr
P
Wi> Zi LB Z>
∂f
j Wj
XX
= Wi Wj> Zj L, (7)
i=1 j=1 ∂Zi
JDMvMDA = arg max V
, (5) i=1 j6=i
Wv ,v=1,...,V j=1
Wi> Zi LW Z>
P
Tr i Wi
i=1 and
V
where the between-class Laplacian matrix is ∂g X
= Wi Wi> Zi L. (8)
C X
C
∂Zi i=1
X 1 1
LB = 2 ( 2
ep e>
p − ep e>
q ). By using (7) and (8) and following the quotient rule, we
p=1 q=1
Np Np Nq
derive the stochastic optimization of MvCCAE to be
and the within-class Laplacian matrix is ∂JMvCCAE 1
∂f ∂g
C
= 2 g −f
X 1 ∂Zv g ∂Zv ∂Zv
LW = I − ec ec > . V
N c ∂ X
c=1 − α `AE (Xv ; Gv (Fv (·))). (9)
∂Zv v
The gradient to compute the autoencoding loss `AE is
3 M ODEL F ORMULATION derived from the view-specific sub-networks Fv and Gv .
∂Zv
The sub-network Fv is optimized with ∂F v
to obtain the
We first introduce the formulation of MvCCAE and MvM- output Zv , while the gradient of Gv network with respect
DAE, and then the extension as multi-view subspace learn- to its parameters can be obtained using the chain rule from
ing to rank. Finally, the end-to-end ranking method is de- ∂Gv (Xv )
∂Zv .
scribed.
A A
B C
C B
D D
E E
Z1 Ranking loss 1
X1 W1
B
A A
A
B B
D
C W2 C
E
D D
C
E E
Z2 Ranking loss 2
X2 WV Fused
Ranking loss A
A
D
B
C
C
B
D
E
E
ZV Ranking loss V
XV
F H G
Fig. 2: System diagram of the Deep Multi-view Discriminant Ranking (DMvDR). First, the features X = {X1 , X2 , . . . , XV }
are extracted for data representations in different views and fed through the individual sub-network Fv to obtain the
nonlinear representation Zv of the v th view. The results are then passed through two pipelines of networks. One line
goes to the projection W, which maps all Zv to the common subspace, and their concatenation is trained to optimize the
fused ranking loss with the fused sub-network H. The other line connects Zv to the sub-network Gv , ∀v = 1, . . . , V for the
optimization of the v th ranking loss.
gradient of multi-view subspace embedding (MvSE) in the networks, the gradients with respect to their parameters can
trace ratio form is calculated by combining (11) and (12): be obtained by following the chain rule.
∂JMvSE 1
∂f ∂g
The update of the entire network of DMvDR can
= 2 g −f . (17) be summarized using the SGD with mini-batches. The
∂Zv g ∂Zv ∂Zv
parameters of the sub-network are denoted by θ =
The embedding layer is important as its gradient is forward {θF1 , θFv , . . . , θFV , θG1 , θGv , . . . , θGV , θH }. A gradient de-
passed to the fused sub-network H. Meanwhile, it is back- scent step is ∆θ = −η ∂θ ∂
JDMvDR , where η is the learning
ward propagated in the layers of Fv to reach the input Xv . rate. The gradient update step at time t can be written down
In turn, the parameters in Gv are also affected by the outputs with the chain rule collectively:
of Fv sub-networks.
∆θt = {∆θF
t t
, ∆θF , . . . , ∆θFt
,
The update of the view-specific Fv depends on the view- 1 2 V
t t t t
specific ranking output y and the cross-view relevance y as ∆θG1 , ∆θG2 , . . . , ∆θGV , ∆θH }
it is a common sub-network in both pipelines of networks. ∂`rank ∂y
Through backpropagation, the v -th sub-networks Fv and ∆θGt v = − ·
∂y ∂Gv
Gv are optimized consecutively with respect to the gradient
∂y t ∂`rank ∂y
∂Xv . Meanwhile, the training error with respect to the ∆θH =− ·
fused ranking y is passed through multi-view subspace ∂y ∂H
embedding (MvSE) from S in (16) as the input to the fused t ∂JMvSE ∂Zv ∂`rank ∂y ∂Zv
sub-network H. The resulting gradient of each sub-network ∆θF = · − · ·
v
∂Zv ∂Fv ∂y ∂Zv ∂Fv
Fv is given by ∂`rank ∂y ∂S ∂Zv
− · · · . (19)
∂JDMvDR ∂JMvSE X ∂ V ∂y ∂S ∂Zv ∂Fv
= −α `Rank (Xv , yv ; Gv (Fv (·))) We generate the training data using the pairwise transform
∂Zv ∂Zv v
∂Zv
presented in Section 2.1.1. During testing, the test samples
∂
−β `Rank (S, y; H(·)), (18) can also be transformed into pairs to evaluate the relative
∂Zv relevance of each sample to its query. The raw ranking data
where α and β are the scaling factors controlling the can also be fed into the trained model to predict their overall
magnitude of the ranking loss. Similar to the other sub- ranking positions.
7
0.85 0.85
0.85 0.85
0.8 0.8
0.8 0.8
Precision
Precision
Precision
Precision
0.75 0.75
0.75 0.75
0.7 0.7
0.7 0.7
0.65 0.65
SmVR
0.65 0.65
0.6 DMvMDA 0.6
MvCCAE
0.55 MvMDAE 0.6 0.6 0.55
DMvDR
0.5 0.55 0.55 0.5
0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
Recall Recall Recall Recall
(a) PR curve on Reuters.
2 views 3 views 4 views 5 views
1 1 1 1
SmVR
0.9 DMvMDA 0.9
MvCCAE
0.8 MvMDAE 0.8 0.8 0.8
DMvDR
0.7 0.7
TP Rate
TP Rate
TP Rate
0.5 0.5
0.3 0.3
0.1 0.1
0 0 0 0
0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
FP Rate FP Rate FP Rate FP Rate
(b) ROC curve on Reuters.
Fig. 5: The PR and ROC curves with 2-5 views applied to Reuters dataset.
Precision
TP Rate
accuracies are stable across views. This can be interpreted 0.75
as the global ranking agreement can be reached to a certain
0.7
level when all languages correspond to each other. It is 0.4
again confirmed that the end-to-end solution consistently 0.65
provides the highest scores, while SvMR is a few percent- SmVR
ages behind. When the features from different views follow 0.6 0.2 DMvMDA
MvCCAE
a similar data distribution, the co-training method performs
0.55 MvMDAE
well and competes with the proposed DMvDR. DMvDR
0.5 0
0 0.5 1 0 0.5 1
4.3 Image Data Ranking Recall FP Rate
Image data ranking is a problem to evaluate the relevance Fig. 6: PR and ROC curves on AWA.
between two images represented by different types of fea- TABLE 3: Quantitative Results (%) on the AWA Dataset.
tures. We adopt the Animal With Attributes (AWA) dataset
[46] for this problem due to its diversity of animal appear-
Methods MAP@100 Accuracy
ance and large number of classes. The dataset is composed
Feature Concat 38.08 50.60
of 50 animal classes with a total of 30475 images, and 85
LMvCCA [5] 49.97 51.85
animal attributes. We follow the feature generation in [5] to
LMvMDA [5] 49.70 52.35
adopt 3 feature types forming the views:
MvDA [6] 49.20 52.82
• Image Feature by VGG-16 pre-trained model: a 1000- SmVR [8] 52.12 50.33
dimensional feature vector is produced from each DMvCCA [5] 51.38 50.83
image by resizing them to 224 × 224 and taken from DMvMDA [5] 51.52 51.38
the outputs of the f c8 layer with a 16-layer VGGNet MvCCAE (ours) 49.01 53.28
[47]. MvMDAE (ours) 48.99 53.30
• Class Label Encoding: a 100-dimensional DMvDR (ours) 76.83 71.48
Word2Vector is extracted from each class label.
Then, we can map the visual feature of each image
We can observe the performance of the methods on the
to the text feature space by using a ridge regressor
animal dataset graphically in Fig. 6 and quantitatively in
with a similar setting as in [5] to genenate another
Table 3. DMvDR outperforms the other competing methods
set of textual feature, with connection to the visual
by a large margin as shown in the plots of Fig. 6. Due
world. The text embedding space is constructed by
to the variety of data distribution from different feature
training a skip-gram [48] model on the entire English
types as view inputs, the co-training type of SmVR can no
Wikipedia articles, including 2.9 billion words.
longer compete with the end-to-end solution. From Table
• Attibute Encoding: an 85-dimensional feature ve-
3, one can observe that the performance of the feature
tor can be produced with a similar idea as above.
concatenation suffers from the same problem. On the other
Since each class of animals contains some typical
hand, our proposed subspace ranking methods produces
patterns of the attribute, a 50 × 85 lookup table can
satisfactory classification rates while the precisions remain
be constructed to connect the classes and attributes
somewhat low. This implies again the scoring function is
[49], [50]. Then, we map each image feature to the
critical to be trained together with the feature mappings.
attribute space to produce the mid-level feature.
The other linear and nonlinear subspace ranking methods
We generate the image ranking data as follows. From have comparatively similar performance at a lower position.
the 50 classes of animal images, we find 400 pairs of images
with 200 in-class pairs and 200 out-of-class image pairs from
5 C ONCLUSION
each class. We then end up with 20000 training data pairs.
Similarly, we will have 20000 test data pairs. We select 40 Learning to rank has been a popular research topic with
images from each class used for training data and a separate numerous applications, while multi-view ranking remains
set of 40 samples as test data. Another 10 images are used a relatively new research topic. In this paper, we aimed to
as queries: 5 of them are associated with the in-class images associate the multi-view subspace learning methods with
as positive sample pairs and 5 as negative sample pairs. For the ranking problem and proposed three methods in this
the negative sample pairs, we randomly select 40 classes direction. MvCCAE is an unsupervised multi-view embed-
from 49 remaining animal classes at a time, and one image ding method, while MvMDAE is its supervised counter-
per class is associated with each query image under study. part. Both of them incorporate multiple objectives, with a
11
correlation maximization on one hand, and reconstruction [18] T. Liu, J. Wang, J. Sun, N. Zheng, X. Tang, and H.-Y. Shum, “Picture
error minimization on the other hand, and have been ex- collage,” IEEE Transactions on Multimedia (TMM), vol. 11, no. 7, pp.
1225 –1239, 2009.
tended in the multi-view subspace learning to rank scheme. [19] X. Li, T. Pi, Z. Zhang, X. Zhao, M. Wang, X. Li, and P. S. Yu, “Learn-
Finally, DMvDR is proposed to exploit the global agreement ing bregman distance functions for structural learning to rank,”
while minimizing the individual ranking losses in a single IEEE Transactions on Knowledge and Data EngineeringS, vol. 29, no. 9,
optimization process. The experimental results validate the pp. 1916–1927, Sept 2017.
[20] O. Wu, Q. You, X. Mao, F. Xia, F. Yuan, and W. Hu, “Listwise learn-
superior performance of DMvDR compared to the other ing to rank by exploring structure of objects,” IEEE Transactions on
subspace and co-training methods on multi-view datasets Knowledge and Data Engineering, vol. 28, no. 7, pp. 1934–1939, 2016.
with both homogeneous and heterogeneous data represen- [21] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank
tations. citation ranking: Bringing order to the web.” Stanford InfoLab,
Tech. Rep., 1999.
In the future, we will explore the scenario when there [22] W. W. Cohen, R. E. Schapire, and Y. Singer, “Learning to order
exists missing data, which is beyond the scope of the current things,” in Advances in Neural Information Processing Systems, 1998,
pp. 451–457.
proposed subspace ranking methods during training. Mul- [23] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamil-
tiple networks can also be combined by concatenating their ton, and G. Hullender, “Learning to rank using gradient descent,”
outputs, and further optimized in a single sub-network. This in Proceedings of the 22nd international conference on Machine learning.
solution may also be applicable for homogeneous represen- ACM, 2005, pp. 89–96.
[24] R. Herbrich, T. Graepel, and K. Obermayer, “Large margin rank
tations. boundaries for ordinal regression,” 2000.
[25] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient
boosting algorithm for combining preferences,” The Journal of
R EFERENCES machine learning research, vol. 4, pp. 933–969, 2003.
[26] Y. Freund and R. E. Schapire, “A desicion-theoretic generalization
[1] T.-Y. Liu, “Learning to rank for information retrieval,” Foundations of on-line learning and an application to boosting,” in European
and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009. conference on computational learning theory. Springer, 1995, pp. 23–
[2] Y. Zhu, G. Wang, J. Yang, D. Wang, J. Yan, J. Hu, and Z. Chen, 37.
“Optimizing search engine revenue in sponsored search,” in Pro- [27] T. Joachims, “Optimizing search engines using clickthrough data,”
ceedings of the 32nd international ACM SIGIR conference on Research in Proceedings of the eighth ACM SIGKDD international conference on
and development in information retrieval. ACM, 2009, pp. 588–595. Knowledge discovery and data mining. ACM, 2002, pp. 133–142.
[3] M. Amini, N. Usunier, and C. Goutte, “Learning from multiple
[28] J. H. Friedman, “Greedy function approximation: a gradient boost-
partially observed views - an application to multilingual text cat-
ing machine,” Annals of statistics, pp. 1189–1232, 2001.
egorization,” in Advances in Neural Information Processing Systems
[29] J. Xu and H. Li, “AdaRank: a boosting algorithm for information
22. Curran Associates, Inc., 2009, pp. 28–36.
retrieval,” in Proceedings of the 30th annual international ACM SI-
[4] J. Yu, D. Tao, M. Wang, and Y. Rui, “Learning to rank using user
GIR conference on Research and development in information retrieval.
clicks and visual features for image retrieval,” IEEE transactions on
ACM, 2007, pp. 391–398.
cybernetics, vol. 45, no. 4, pp. 767–779, 2015.
[30] C. J. Burges, R. Ragno, and Q. V. Le, “Learning to rank with nons-
[5] G. Cao, A. Iosifidis, K. Chen, and M. Gabbouj, “General-
mooth cost functions,” in Advances in neural information processing
ized multi-view embedding for visual recognition and cross-
systems, 2007, pp. 193–200.
modal retrieval,” IEEE Transactions on Cybernetics, 2017, doi:
10.1109/TCYB.2017.2742705. [31] H. Hotelling, “Relations between two sets of variates,” Biometrika,
[6] M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen, “Multi-view pp. 321–377, 1936.
discriminant analysis,” IEEE Transactions on Pattern Analysis and [32] M. Borga, “Canonical correlation: a tutorial,” http://people.imt.
Machine Intelligence (T-PAMI), vol. 38, no. 1, pp. 188–194, Jan 2016. liu.se/∼magnus/cca/tutorial/tutorial.pdf, 2001.
[7] A. Blum and T. Mitchell, “Combining labeled and unlabeled data [33] A. A. Nielsen, “Multiset canonical correlations analysis and mul-
with co-training,” in Proceedings of the eleventh annual conference on tispectral, truly multitemporal remote sensing data,” IEEE Trans-
Computational learning theory. ACM, 1998, pp. 92–100. actions on Image Processing (TIP), vol. 11, no. 3, pp. 293–305, 2002.
[8] N. Usunier, M.-R. Amini, and C. Goutte, “Multiview semi- [34] Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, “A multi-view embed-
supervised learning for ranking multilingual documents,” Machine ding space for modeling internet images, tags, and their seman-
Learning and Knowledge Discovery in Databases, pp. 443–458, 2011. tics,” International Journal of Computer Vision, vol. 106, no. 2, pp.
[9] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical 210–233, 2014.
correlation analysis: An overview with application to learning [35] Y. Luo, D. Tao, K. Ramamohanarao, C. Xu, and Y. Wen, “Tensor
methods,” Neural computation, vol. 16, no. 12, pp. 2639–2664, 2004. canonical correlation analysis for multi-view dimension reduc-
[10] J. Rupnik and J. Shawe-Taylor, “Multi-view canonical correlation tion,” IEEE Transactions on Knowledge and Data Engineering, vol. 27,
analysis,” in Slovenian KDD Conference on Data Mining and Data no. 11, pp. 3111–3124, Nov 2015.
Warehouses (SiKDD 2010), 2010, pp. 1–4. [36] A. Sharma, A. Kumar, H. Daume III, and D. W. Jacobs, “General-
[11] A. Iosifidis, A. Tefas, and I. Pitas, “Kernel reference discriminant ized multiview analysis: A discriminative latent space,” in Proceed-
analysis,” Pattern Recognition Letters, vol. 49, pp. 85–91, 2014. ings of IEEE Conference on Computer Vision and Pattern Recognition
[12] A. Iosifidis and M. Gabbouj, “Nyström-based approximate kernel (CVPR). IEEE, 2012, pp. 2160–2167.
subspace learning,” Pattern Recognition, vol. 57, pp. 190–197, 2016. [37] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng,
[13] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical “Multimodal deep learning,” in Proceedings of the 28th international
correlation analysis,” in Proceedings of the 30th International Confer- conference on machine learning (ICML-11), 2011, pp. 689–696.
ence on Machine Learning, 2013, pp. 1247–1255. [38] W. Wang, R. Arora, K. Livescu, and J. Bilmes, “On deep multi-view
[14] G. Cao, A. Iosifidis, and M. Gabbouj, “Multi-view nonparametric representation learning,” in Proceedings of the 32nd International
discriminant analysis for image retrieval and recognition,” IEEE Conference on Machine Learning (ICML), 2015, pp. 1083–1092.
Signal Processing Letters, vol. 24, no. 10, pp. 1537–1541, Oct 2017. [39] S. Chandar, M. M. Khapra, H. Larochelle, and B. Ravindran,
[15] F. Feng, L. Nie, X. Wang, R. Hong, and T.-S. Chua, “Computational “Correlational neural networks,” Neural computation, 2016.
social indicators: A case study of chinese university ranking,” [40] M. Kan, S. Shan, and X. Chen, “Multi-view deep network for
in Proceedings of the 40th International ACM SIGIR Conference on cross-view classification,” in Proceedings of the IEEE Conference on
Research and Development in Information Retrieval. New York, NY, Computer Vision and Pattern Recognition, 2016, pp. 4847–4855.
USA: ACM, 2017, pp. 455–464. [41] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph
[16] “The times higher education world university ranking,” embedding and extensions: a general framework for dimension-
https://www.timeshighereducation.com/world-university- ality reduction,” IEEE Transactions on Pattern Analysis and Machine
rankings, 2016. Intelligence (T-PAMI), vol. 29, no. 1, pp. 40–51, 2007.
[17] N. C. Liu and Y. Cheng, “The academic ranking of world univer- [42] Y. Jia, F. Nie, and C. Zhang, “Trace ratio problem revisited,” IEEE
sities,” Higher education in Europe, vol. 30, no. 2, pp. 127–136, 2005. Transactions on Neural Networks, vol. 20, no. 4, pp. 729–735, 2009.
12