Documente Academic
Documente Profesional
Documente Cultură
a r t i c l e i n f o
abstract
Article history:
Received 20 September 2014
Accepted 29 September 2014
Available online 14 October 2014
Recommendation systems have become prevalent in recent years as they dealing with the
information overload problem by suggesting users the most relevant products from a
massive amount of data. For media product, online collaborative movie recommendations
make attempts to assist users to access their preferred movies by capturing precisely
similar neighbors among users or movies from their historical common ratings. However,
due to the data sparsely, neighbor selecting is getting more difficult with the fast
increasing of movies and users. In this paper, a hybrid model-based movie recommendation system which utilizes the improved K-means clustering coupled with genetic
algorithms (GAs) to partition transformed user space is proposed. It employs principal
component analysis (PCA) data reduction technique to dense the movie population space
which could reduce the computation complexity in intelligent movie recom-mendation as
well. The experiment results on Movielens dataset indicate that the proposed approach
can provide high performance in terms of accuracy, and generate more reliable and
personalized movie recommendations when compared with the existing methods.
& 2014 Elsevier Ltd. All rights reserved.
Keywords:
Movie recommendation
Collaborative filtering
Sparsity data
Genetic algorithms
K-means
1. Introduction
Fast development of internet technology has resulted
in explosive growth of available information over the last
decade. Recommendation systems (RS), as one of the most
successful information filtering applications, have become
an efficient way to solve the information overload problem. The aim of Recommendation systems is to automatically generate suggested items (movies, books, news,
music, CDs, DVDs, webpages) for users according to their
http://dx.doi.org/10.1016/j.jvlc.2014.09.011
1045-926X/& 2014 Elsevier Ltd. All rights reserved.
668
2. Related work
2.1. Movie recommendation systems based on collaborative
filtering
Recommendation systems (RS), introduced by Tapestry
project in 1992, is one of the most successful information
management systems [12]. The practical recommender
applications help users to filter mass useless information
for dealing with the information overloading and providing
personalized suggestions. There has been a great success in
e-commerce to make the customer access the preferred
products, and improve the business profit. In addition, to
enhance the ability of personalization, recommendation
system is also widely deployed in many multimedia websites for targeting media products to particular customers.
Nowadays, collaborative filtering (CF) is the most effective
technique employed by movie recommendation systems,
which is on the basis of the nearest-neighbor mechanism.
It assumes that people who have similar history rating
pattern may be on the maximum likelihood that have the
same preference in the future. All like-minded users, called
neighbors, are derived from their rating database that is
recording evaluation values to movies. The prediction of a
missing rating given by a target user can be inferred by the
weighted similarity of his/her neighborhood.
Ref. [6] divides CF techniques into two important classes
of recommender systems: memory-based CF and modelbased CF. Memory-based CF operate on the entire user
space to search nearest neighbors for an active user, and
automatically produce a list of suggested movies to recommend. This method suffers from the computation complexity and data sparsity problem. In order to address
computational and memory bottleneck issues, Sarwar
et al. proposed an item-based CF in which the correlations
between items are computed to form the neighborhood for
a target item [4]. Their empirical studies proved that that
item-based approach can shorten computation time apparently while providing comparable prediction accuracy.
Model-based CF, on the other hand, develops a pre-build
model to store rating patterns based on user-rating database
which can deal with the scalability and sparsity issues. In
terms of recommendation quality, model-based CF applications can perform as well as memory-based ones. However,
model-based approaches are time-consuming in building
and training the offline model which is hard to be updated as
well. Algorithms that often used in model-based CF applications include Bayesian networks [6], clustering algorithms
[911], neural networks [13], and SVD (singular value
decomposition) [5,14]. While traditional collaborative
recommendation systems have their instinct limitations,
such as computational scalability, data sparsity and coldstart, and these issues are still challenges that affect the
prediction quality. Over the last decade, there have been high
interests toward RS area due to the possible improvement in
performance and problems solving capability.
2.2. Clustering-based collaborative recommendation
In movie recommendation, clustering is a widely used
approach to alleviate the scalability problem and provides
669
670
of the increased quality of recommendation and robustness. The objective of this section is to propose an effective
classification method to ensure the users who have the
same preference could fall into one cluster to generate
accurate like-minded neighbors. The GA-KM algorithm we
employed in this work can be roughly performed in two
phases:
1. K-means clustering:
K-means algorithm is one of the most commonly used
clustering approaches due to its simplicity, flexibility
and computation efficiency especially considering large
amounts of data. K-means iteratively computes k cluster centers to assign objects into the most nearest
cluster based on distance measure. When center points
jjxi Mj jj2
j 1 i A C temp
671
672
MAE
xj A X
P Ua;item Ru
y A C x simU a ; y Ry;i Ry
y A C x jsimU a ; yj
jinteresting \ TopNj
N
jinteresting \ TopNj
jinterestingj
5
6
673
Table 1
The t-test results for various cluster-based methods in terms of MAEs.
Method
Mean
Std. Dev.
PCA-GAKM
t-Value
Sign
PCA-SOM
SOM-CLUSTER
UPCC
KMEANS-CLUSTER
PCA-KMEANS
GAKM-CLUSTER
PCA-GAKM
0.8018
0.8078
0.8232
0.8175
0.8412
0.8040
0.7821
5.900e 03
2.589e 03
1.292e 03
2.244e 03
2.845e 03
2.786e 03
4.747e 03
9.0633
16.5194
28.9869
23.3691
37.0515
13.8541
.000
.000
.000
.000
.000
.000
674
Acknowledgements
The research was supported by the National Natural
Science Foundation of China (nos. 71271149, 70901054
and 61202030). It was also supported by the Program for
New Century Excellent Talents in University (NCET). The
authors are very grateful to all anonymous reviewers
whose invaluable comments and suggestions substantially
helped improve the quality of the paper.
References
[1] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender system: a survey of the state-of-the-art and possible
extensions, IEEE Trans. Knowl. Data Eng. 17 (6) (2005) 734749.
[2] G. Linden, B. Smith, J. York, Amazon.com recommendations: item to
item collaborative filtering, IEEE Internet Comput. 7 (1) (2003)
7680.
[3] B.M. Sarwar, G. Karypis, J. Konstan, J. Riedl, Recommender systems
for large-scale e-commerce: scalable neighborhood formation using
clustering, in: Proceedings of International Conference on Computer
and Information Technology, Dhaka, Bangladesh, 2002.
675
[16] G. Pitsilis, X.L. Zhang, W. Wang, Clustering recommenders in collaborative filtering using explicit trust information, in: Proceedings of
the Fifth International Conference on Trust Management IFIPTM,
Denmark, Copenhagen, 2011, pp. 8297.
[17] M. Zhang, N. Hurley, Novel item recommendation by user profile
partitioning, in: Proceedings of the International. Conference on
Web Intelligence and Intelligent Agent Technology, Washington DC,
USA, 2009, pp. 508515.
[18] C. Huang, J. Yin, J, Effective association clusters filtering to cold-start
recommendations, in: Proceedings of the Seventh International
Conference on Fuzzy Systems and Knowledge Discovery, Yantai,
Shandong, 2010, pp. 24612464.
[19] J. Wang, N.-Y. Zhang, J. Yin, J, Collaborative filtering recommendation
based on fuzzy clustering of user preferences, in: Proceedings of the
Seventh International Conference on Fuzzy Systems and Knowledge
Discovery, Yantai, Shandong, 2010, pp. 19461950.
[20] D.R. Liu, Y.Y. Shih, Hybrid approaches to product recommendation
base on customer lifetime value and purchase preferences, J. Syst.
Softw. 77 (22) (2005) 181191.
[21] O. Georgiou, N. Tsapatsoulis, Improving the scalability of recommender systems by clustering using genetic algorithms, in: Proceedings of the 20th International conference on Artificial Neural
Networks, Thessaloniki, Greece, 2010, pp. 442449.
[22] K. Honda, N. Sugiura, H. Ichihashi, S. Araki, Collaborative filtering
using principal component analysis and fuzzy clustering, in: Proceedings of the First Asia-Pacific Conference on Web Intelligence:
Research and Development, Maebashi City, Japan, 2001, pp. 394402.
[23] A. Selamat, S. Omatu, Web page feature selection and classification
using neural networks, Inf. Sci. 158 (2004) 6988.
[24] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, San Francisco, 2001.
[25] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principle, Addison
Wesley, Massachusetts, MA, 1974.
[26] D. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, New York, NY, 1989.
[27] P. Resnick, N. Iacovou, M. Sushak, P. Bergstrom, J. Riedl, GroupLens:
an open architecture for collaborative filtering of netnews, in:
Proceedings of ACM 1994 Conference on Computer Supported
Cooperative Work, Chapel Hill, 1994, pp. 175186.