Documente Academic
Documente Profesional
Documente Cultură
PROJECT
ON
ACKNOWLEDGEMENTS
We take this unique opportunity to express our heartfelt thanks and gratitude to our respected
guide, Professor Dr. Satish Chand, Department of Computer Engineering, NSIT who kindly
consented to be our guide for this project.
We thank him for the precious time he devoted to us, for his expert guidance from the
commencement, for the kind attitude and the resources he arranged for us. It is only because of
him that we have been able to successfully complete this project.
We also owe our thanks to all the faculty members for their constant support and
encouragement.
DECLARATION
This is to certify that the project entitled, Recommendation System for Web Scale
Graphs by Nikhil Menon, Sapan Garg and Sarthak Kukreti is a record of bonafide work
carried out by us, in the Department of Computer Engineering, Netaji Subhas Institute of
Technology, University of Delhi, New Delhi, in partial fulfillment of requirements for the award
of the degree of Bachelor of Engineering in Computer Engineering, University of Delhi in the
academic year 2012-2013.
The results presented in this thesis have not been submitted to any other university in any
form for the award of any other degree.
Nikhil Menon
Roll No. 266/CO/09
Sapan Garg
Roll No. 295/CO/09
Sarthak Kukreti
Roll No. 296/CO/09
CERTIFICATE
This is to certify that the project entitled, Recommendation System for Web Scale
Graphs by Nikhil Menon, Sapan Garg and Sarthak Kukreti is a record of bonafide work
carried out by them, in the department of Computer Engineering, Netaji Subhas Institute of
Technology, University of Delhi, New Delhi, under our supervision and guidance in partial
fulfillment of requirements for the award of the degree of Bachelor of Engineering in Computer
Engineering, University of Delhi in the academic year 2012-2013.
The results presented in this thesis have not been submitted to any other university in any
form for the award of any other degree.
ABSTRACT
E-Commerce websites like Amazon, Netflix etc. have a huge amount of data pertaining
to user behavior and item similarity. These websites have been consistently using this data to
improve the user experience. Among the various improvements that these websites have
undergone, Recommendation System has been the game changer. It uses past user experience to
recommend items for the future, based on the users prior purchases and similarity between items
(e.g. movies). Since its first introduction to the websites, considerable research has gone into
improving these systems by making precise and faster recommendations. Another area of
improvement is the scalability of the algorithms.
In this work we use (1) Neighbourhood Models (2) K-Means Clustering and (3) Matrix
Factorization for Rating Prediction. In the first one, similarity amongst the various users or items
is taken. This is measured in terms of correlation using measures like Pearsons R, Jaccard
Coefficient and Cosine Distance. In the second method, a predefined number of clusters is set in
the beginning itself and the users are then put in different clusters. After that the rating is decided
on the basis of the users ratings of that cluster. In the third method, we decompose the rating
matrix into two groups. One matrix consists of each users latent factors and the second one
consists of items latent factors. The number of latent factors is set manually and the value of the
factors is learnt using Gradient descent. We are proposing two new methods Time Based
Neighbourhood Model and Supervised Random Walk to Matrix Factorization. The error measure
used is RMSE and we have obtained RMSE in the range of 0.9541 to 1.3335 for the above
mentioned methods.
iv
LIST OF TABLES
Table
Caption
Page
1.1
3.1
29
3.2
30
3.3
30
4.1
45
4.2
46
4.3
46
NOTATION
Symbol
Definition
a, p, q
Column vectors
A, P, Q
Matrices
Pu,k
Number of users
Number of items
u,v
i,j
{1,2,N}
User Indices
{1,2,M}
Item Indices
ru,i
r*u,i
Rating error
Ii,i
Indicator Variable
Regularization Parameter
Learning rate for gradient methods
vi
INDEX OF EQUATIONS
Equation
Caption
Page
Equation 2.1
Equation 2.2
11
Equation 2.3
12
Equation 3.1
Cosine Similarity
14
Equation 3.2
User-User Correlation
15
Equation 3.3
16
Equation 3.4
17
Equation 3.5
Jaccards Index
18
Equation 3.6
22
Equation 3.7
Euclidean Distance
27
Equation 3.8
32
Equation 3.9
Predicted rating
32
Equation 3.10
32
Equation 3.11
33
Equation 3.12
33
Equation 3.13
33
Equation 3.14
33
Equation 3.15
33
vii
Equation 3.16
36
Equation 3.17
36
Equation 3.18
SVD++ Model
38
Equation 3.19
38
Equation 3.20
42
graph
Equation 3.21
42
42
Equation 3.23
42
Equation 3.24
43
Equation 3.25
Learning Error
43
viii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ................................................................................................. i
DECLARATION ................................................................................................................ ii
CERTIFICATE .................................................................................................................. iii
ABSTRACT....................................................................................................................... iii
LIST OF TABLES ............................................................................................................. iii
NOTATION ....................................................................................................................... vi
INDEX OF EQUATIONS ................................................................................................ vii
TABLE OF CONTENTS ................................................................................................... ix
CHAPTER ONE: INTRODUCTION ..................................................................................1
1.1 Background ................................................................................................................2
1.2 Problem Definition ....................................................................................................3
CHAPTER TWO: LITERATURE REVIEW ......................................................................6
2.1 Content Based Filtering .............................................................................................7
2.2 Collaborative Filtering ...............................................................................................8
2.2.1 Neighbourhood based methods .........................................................................8
2.2.2 Model based methods ......................................................................................10
2.3 Evaluation Measures ................................................................................................11
CHAPTER THREE: PROPOSED WORK .......................................................................13
3.1 Dataset .....................................................................................................................13
3.2 Correlation Measures ...............................................................................................13
3.2.1 Cosine Similarity .............................................................................................14
3.2.2 Pearsons R ......................................................................................................16
3.2.3 Jaccard Index ...................................................................................................18
3.3 Time Based Approach to Neighbourhood Models ..................................................22
3.4 A clustering approach to collaborative filtering ......................................................25
3.5 Latent Factor Models ...............................................................................................29
3.5.1 Non Negative Matrix Factorization .................................................................31
3.5.2 Singular Value Decomposition........................................................................35
3.5.3 SVD++.............................................................................................................38
3.6 A Supervised Random Walk Approach to Matrix Factorization .............................41
CHAPTER FOUR: RESULTS ..........................................................................................45
CHAPTER FIVE: CONCLUSION AND FUTURE WORK ............................................47
REFERENCES ..................................................................................................................49
ix
1.1
Background
Recommender systems attempt to screen user preferences over items and build a
relational model between users and items. A recommender systems recommends items that fit
the users tastes, in order to help the user in purchasing/viewing relevant items from an
overwhelming set of choices. These systems have great importance in applications such
as e-commerce, subscription based services, information filtering, news rooms etc.
Recommender systems providing personalized suggestions greatly increase the likelihood of a
customer making a purchase compared to generalized recommendations. An example of the
same would that of the Netflix. Just after a month of the Netflix recommendation competition
there was almost 17% increase in the accuracy of the recommendations thus aiding in the
increase of their revenues because of personalization.
Personalized recommendations are especially important in markets where the number of
choices is enormous, the taste of the customer is of prime importance and last but not least the
price of the items is modest. Typical areas of such services are mostly related to sale of books,
music, fashion, and gaming or recommendation of news articles, humor etc.
With the exponential explosion of web based businesses, an increasing number of web based
merchant or rental services use recommender systems. Some of the major participants of ecommerce web, like Amazon and Netix have successfully applied recommender systems to
deliver automatically generated personalized recommendation to their customers.
There are two basic strategies that can be applied when generating recommendations:Content-based approaches: It characterizes users and items by identifying their features like
demographic data for user characterization, and product descriptions for item characterization.
The features are used by algorithms to connect user interests and item descriptions when
2
generating recommendations. Since it is usually time taking to collect the necessary information
about items, and also often difficult to motivate users to reveal their personal information
required to create the database for the basis of characterization, these methods are seldom used.
Collaborative Filtering (CF): It makes use of only past user activities (for example, transaction
history or user satisfaction expressed in ratings) and is usually more feasible. CF approaches can
be applied to recommender systems regardless of the type of data available. CF algorithm
identifies relationships between users and items, and makes assumptions using this information
to predict user preferences.
1.2
Problem Definition
In a typical recommendation problem, let U be a set of n users and V be a set of m items.
Also, let u be the identifier for users and i be the identifier for items. Each user u has a list of
items Vu , about which the user has expressed an opinion, explicitly(in the form of ratings) or
implicitly(through mining purchase records, web logs etc.). The opinions of the users are stored
on an n x m matrix R, which is known as the ratings matrix. Each cell Ru,i of R, represents the
rating that the uth user has given to the ith item.
Each rating is on a numerical scale and can also be 0 to represent that a user has not rated
an item yet. Therefore, the task of the recommendation algorithm is to model the preferences of a
particular user, using data from the ratings matrix. This includes items which the user has not
rated yet. In practice, we cannot measure the error because the distribution of (U, V, R) is
unknown, but we can estimate the error on a validation set, for example, by randomly
partitioning the above sample into a smaller training set and a validation set. The performance of
the model is calculated in the form of deviation from actual ratings.
3
{0,1} n x m be an indicator variable i.e. Iu,i = 1 iff user u has rated item i.
Formally, let I
. Let Rtrain and Rtest be the training and test dataset where Ru,i : {(u,i,x) | u
U, i
V,
Iu,i = 1, Ru,i = x)}The goal of the recommendation problem is to create a model utilizing
information, explicit as well as implicit, from the training dataset which minimizes h(R*, R)
where h measures the deviation of ratings predicted over actual value on a testing subset of the
actual data set. We use different error functions [Chapter 2] to see the relevance of different error
functions in different situations.
User/Show
Suits
DN
GOT
PB
3
4
3
5
Similarity between TV series: People who like DN also tend to like PB.
information. It is therefore the job of the recommendation algorithm to utilize this decision to
make the choice for any user.
The layout of all the chapters is given as followsChapter 2 gives literature survey giving an overview of the approaches used by us.
Chapter 3, on the other hand gives a complete overview of the theory. First we describe the
dataset used. Then we describe in detail the approaches used by us - Neighbourhood Models, KMeans Clustering and Matrix Factorization. Finally we describe our proposed approach i.e. A
Supervised Random Walk to Matrix Factorization.
In Chapter 4, we show the results obtained by us using these algorithms.
In Chapter 5, we give the conclusion and future work.
***
Nowadays, there are many companies offering recommender systems on their platforms.
Amazon is a typical example of successful recommendation engine's technology. The main
function used is based on collaborative item-to-item contextual recommendations, this was
introduced very early on the site (late 90's). It is based on the logs of purchases and corresponds
to the calculation of a similarity matrix of items. Amazon popularized the famous feature "people
who bought this item also bought these items".
The following figure shows an example of a content based recommender. The known
relationships are marked as full arrows, the calculated or inserted object similarity by the dotted
arrow and the predicted relationship by the dashed arrow.
neighbourhood model. For example, Flipkart has a feature where whenever you view a product a
list appears at the bottom of the page saying that users who viewed this also viewed this. In this
case, a users similarity with other users is calculated on the basis of the number of similar items
rated by both the users and how close both their ratings are.
In case of items, the items which are similar to the items that the user has already rated
are chosen as recommendations. This is known as the Item-item neighbourhood model Amazon
has a feature where as soon as you buy a product at the side of the page a number of items are
shown saying that users who bought the particular item also bought the given items. In this once
an item is purchased its similarity with other items is calculated on the basis of the number of
users similar to both product and the proximity of their ratings.
The equation [8][9] being used in this project for neighbourhood models is given below.
9
(2.1)
In the equation given above, ui denotes the rating to be calculated. denotes the overall
average of all the ratings given across all users and all movies. bu is the deviation of average of
rating given by user u from the overall mean. bi is the deviation of rating given to the movie i
from the overall mean. rki is the rating given by user k to item i. k is the average rating given by
user k. wku is the similarity between user k and u.
Here, wku is calculated by using correlation. Correlations indicate a predictive relationship
e.g. in stock market the correlation between two stocks is helpful in determining the trend that
the particular stock will follow. If the correlation between two stocks in high then both of them
will follow the same pattern i.e. if ones stock price increases a lot then we can safely assume
that the stock price of the second stock will also increase in almost the same manner.
Correlations can be calculated using various measures. In this project:1. Cosine Similarity
2. Pearsons R
3. Pearsons R with Jaccard Index
are being used to calculate the correlation between the users and the items
.
most used collaborative filtering techniques. The basic approach of matrix factorization is to try
and approximately factorize the ratings matrix in terms of independent factors of Users and
Items.
Formally, given R
k x n and Q
k x m , set of k latent factors for each user and item respectively such that:
R
PT Q
(2.2)
The features learnt in matrix factorization are implicit features in the sense that they do not
represent any observable attribute. These features are representative of the underlying structure
of the ratings matrix.
These methods use optimization techniques like stochastic gradient descent to learn the value of
model parameters P and Q, which can then be used to predict the rating user u will give item i.
In this work we have used:
SVD++
We have also proposed a new supervised random walk based approach which is described in
Chapter 3.
11
(2.3)
In the next chapter we have discussed the above mentioned algorithms and also our proposed
algorithms.
***
12
13
1. Cosine Similarity
2. Pearsons R
3. Jaccard Index
Given two vectors of attributes, A and B, the cosine similarity, cos(), is represented using a dot
product and magnitude as
(3.1)
The formula used in the project is given below.
14
(3.2)
Here, i,j denote either user/item i and j respectively. Xik denotes the rating given/received by user
i/item i to/from item k/user k. Yjk denotes the rating given/received by user j/item j to/from item
k/user k.
The code written for user-user correlation using cosine distance is given below.
cor=[]
#matrix to store the correlation between users
cor.append([])
i=0
while i<=943:
cor.append([])
j=0
cor[i].append(None)
while j<i:
cor[i].append(cor[j][i])
j+=1
cor[i].append(None)
#indicate users correlation with himself
j=i+1
while j<=943:
m=1
sum=0.0
#store the sum of all common movie
rating between users i and j
x=0.0
y=0.0
while m<len(lis[i]):
n=lis[j].index(lis[i][m]) if lis[i][m] in lis[j] else None
if n!=None:
sum+=ratings[i][m]*ratings[j][n]
x+=ratings[i][m]*ratings[i][m]
y+=ratings[j][n]*ratings[j][n]
m+=1
if x and y:
15
cor[i].append(sum/math.sqrt(x*y))
else:
cor[i].append(None)
j+=1
i+=1
3.2.2 Pearsons R
The Pearson product-moment correlation coefficient (sometimes referred to as the PPMCC or
PCC, or Pearson's r) is a measure of the linear correlation (dependence) between two variables X
and Y, giving a value between +1 and 1 inclusive.
Pearson's correlation coefficient between two variables is defined as the covariance of the two
variables divided by the product of their standard deviations. The form of the definition involves
a "product moment", that is, the mean (the first moment about the origin) of the product of the
mean-adjusted random variables; hence the modifier product-moment in the name.
For a population
Pearson's correlation coefficient when applied to a population is commonly represented by the
Greek letter (rho) and may be referred to as the population correlation coefficient or the
population Pearson correlation coefficient. The formula for is:
(3.3)
For a sample
Pearson's correlation coefficient when applied to a sample is commonly represented by the letter
r and may be referred to as the sample correlation coefficient or the sample Pearson correlation
16
coefficient. We can obtain a formula for r by substituting estimates of the covariances and
variances based on a sample into the formula above. That formula for r is:
In this project the formula that has been used for implementing the Pearsons R correlation is
given below.
(3.4)
Here i,j denote either user/item i and j respectively. Xik denotes the rating given/received by user
i/item i to/from item k/user k. Yjk denotes the rating given/received by user j/item j to/from item
k/user k. Yk and Xk denotes the mean of all the ratings given/received by user k/item k.
The python code written for the same is given below for item-item correlation is given below.
cor=[]
#matrix to store the correlation between users
cor.append([])
i=0
while i<=943:
cor.append([])
j=0
cor[i].append(None)
while j<i:
cor[i].append(cor[j][i])
j+=1
cor[i].append(None)
j=i+1
while j<=943:
m=1
sum=0.0
#store the sum of all common movie rating between users
x=0.0
#stores the square of all the ratings given by user i
y=0.0
#stores the square of all the ratings given by user j
17
while m<len(lis[i]):
n=lis[j].index(lis[i][m]) if lis[i][m] in lis[j] else None
if n!=None:
sum+=((ratings[i][m]-ratings[i][0])*(ratings[j][n]ratings[j][0]))
x+=((ratings[i][m]-ratings[i][0])*(ratings[i][m]ratings[i][0]))
y+=((ratings[j][n]-ratings[j][0])*(ratings[j][n]ratings[j][0]))
m+=1
if x and y:
cor[i].append(sum/math.sqrt(x*y))
else:
cor[i].append(None)
j+=1
i+=1
... (3.5)
Here i,j denote either user/item i and j respectively. J is the jaccard coefficient for the two
users/items. Here Xik denotes the rating given/received by user i/item i to/from item k/user k.
Here Yjk denotes the rating given/received by user j/item j to/from item k/user k. Yk and Xk
denote the mean of all the ratings given/received by user k/item k.
18
The python code written for the same is given below for user-user correlation is given below.
cor=[]
#matrix to store the correlations
cor.append([])
i=0
while i<=943:
cor.append([])
j=0
cor[i].append(None)
while j<i:
cor[i].append(cor[j][i])
j+=1
cor[i].append(None) #used to indicate users correlation with himself
j=i+1
while j<=943:
m=1
sum=0.0
#store the sum of all common movie rating between users
x=0.0
#stores the square of all the ratings given by user i
y=0.0
#stores the square of all the ratings given by user j
inter=0.0
#denotes no of common movies amongst two users
while m<len(lis[i]):
n=lis[j].index(lis[i][m]) if lis[i][m] in lis[j] else None
if n!=None:
sum+=((ratings[i][m]-ratings[i][0])*(ratings[j][n]ratings[j][0]))
x+=((ratings[i][m]-ratings[i][0])*(ratings[i][m]ratings[i][0]))
y+=((ratings[j][n]-ratings[j][0])*(ratings[j][n]ratings[j][0]))
inter+=1
m+=1
jcoef=inter/(len(lis[i])+len(lis[j])-inter-2)
if x and y:
cor[i].append(jcoef*sum/math.sqrt(x*y))
else:
cor[i].append(None)
j+=1
i+=1
While calculating the rating of the particular user two approaches have been taken
i) K-Nearest Neighbour Approach
ii) All Neighbour Approach
19
In the first case 10 nearest neighbours have been taken and their correlation with the given user
is used to calculate the final rating. Snippets of the code for the approach are given below.
rmse=0.0
sqsum=0.0
nooflines=0
fil=open("F:\\btp\dataset\movielens\ml-100k\ua.test","r")
for line in fil.readlines():
st=line.split("\t")
maxcor=[0,0,0,0,0,0,0,0,0,0] #matrix to store the 10 highest
correlation
corra=[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
i=int(st[0])
j=int(st[1])
l=1
while l<=943:
if l!=i:
if j in lis[l]:
if cor[i][l]>maxcor[0]:
maxcor[0]=cor[i][l]
n=lis[l].index(j)
corra[0]=ratings[l][n]-ratings[l][0]
k=0
while k<9 and maxcor[k]>maxcor[k+1]:
t=maxcor[k]
maxcor[k]=maxcor[k+1]
maxcor[k+1]=t
t=corra[k]
corra[k]=corra[k+1]
corra[k+1]=t
k+=1
l+=1
sum=0.0
l=0
while l<10:
sum+=maxcor[l]
l+=1
l=0
while l<10 and sum:
#calculating weight of each neighbor
maxcor[l]/=sum
l+=1
l=0
sum=0.0
denom=0.0
while l<10:
sum+=maxcor[l]*corra[l]
20
denom+=maxcor[l]
l+=1
if denom:
prer=rats[i][0]+movrats[j][0]-u+sum/denom
In the all neighbour approach all the users whose correlation is present with the user are taken
into account. The code snippets for the same are given below.
rmse=0.0
sqsum=0.0
nooflines=0
fil=open("F:\\btp\dataset\movielens\ml-100k\ua.test","r")
for line in fil.readlines():
maxcor=[0,0,0,0,0,0,0,0,0,0]
corra=[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
st=line.split("\t")
i=int(st[0])
j=int(st[1])
l=1
while l<=943:
if l!=j:
if i in mov[l]:
maxcor.append(cor[j][l])
n=mov[l].index(i)
corra.append(movrats[l][n]-movrats[l][0])
l+=1
sum=0.0
l=0
while l<len(maxcor):
if maxcor[l]!=None:
sum+=maxcor[l]
l+=1
l=0
while l<len(maxcor):
if maxcor[l]!=None
maxcor[l]/=sum
#calculating weight of each item
l+=1
l=0
sum=0.0
denom=0.0
while l<len(maxcor):
if maxcor[l]!=None:
sum+=maxcor[l]*corra[l]
denom+=maxcor[l]
l+=1
prer=ratings[i][0]+movrats[j][0]-u+sum/denom
21
The RMSEs obtained from this method ranged from 1.2268 to 1.3365.
formula
used
for
calculating
the
rating
is
given
below-
(3.6)
In the equation given above, ui denotes the rating to be calculated,
ta
average of all the ratings given across all users and all movies, buta is the deviation of average of
ratings given by user u from the overall mean, bita is the deviation of rating given to the movie i
from the overall mean, rki is the rating given by user k to item i, k is the average rating given by
user k, wkuta is the similarity between user k and u. Here the superscript ta is used to denote that
the value is calculated at a particular time.
22
Here cosine similarity measure has been used to calculate the correlation between different users
at each time interval. The code for Time based neighbourhood model using 10 nearest
neighbours is given below.
1. Sorting of the training data
fil=open("F:\\btp\dataset\movielens\ml-100k\ua.base","r")
for line in fil.readlines():
st=line.split("\t")
lis[int(st[0])].append(int(st[1]))
ratings[int(st[0])].append(int(st[2]))
tim[int(st[0])].append(int(st[3]))
mov[int(st[1])].append(int(st[0]))
movrats[int(st[1])].append(int(st[2]))
i=1
while i<=943:
j=1
while j<len(lis[i]):
k=1
while k<len(lis[i])-j:
if tim[i][k]>tim[i][k+1]:
t=lis[i][k]
lis[i][k]=lis[i][k+1]
lis[i][k+1]=t
t=rats[i][k]
ratings[i][k]=rats[i][k+1]
ratings[i][k+1]=t
t=tim[i][k]
tim[i][k]=tim[i][k+1]
tim[i][k+1]=t
k+=1
j+=1
i+=1
2. Code for finding the correlation and the corresponding rating at each time interval
t=1
rmse=[]
u=0
d=0
while t<=100 and q<=ma:
time interval
i=1
u=0
d=0
while i<1682:
23
movrats[i][0]=0
#the 1st position will store the average rating
given to the movie
movrats[i][len(movrats[i])-1]=0
i+=1
i=1
while i<=943:
#algo for finding correlation
j=0
cor[i][j]=None
j+=1
while j<i:
cor[i][j]=cor[j][i]
j+=1
cor[i][j]=None
j+=1
while j<=943:
m=1
sum=0.0
x=0.0
y=0.0
s=0
while m<len(lis[i]) and tim[i][m]<q:
n=lis[j].index(lis[i][m]) if lis[i][m] in lis[j] else None
if n!=None and tim[j][n]<q:
sum+=ratings[i][m]*ratings[j][n]
x+=ratings[i][m]*ratings[i][m]
y+=ratings[j][n]*ratings[j][n]
s+=ratings[i][m]
u+=ratings[i][m]
movrats[lis[i][m]][0]+=ratings[i][m]
movrats[lis[i][m]][len(movrats[lis[i][m]])-1]+=1
m+=1
d+=1
rats[i][0]=s/m
if x and y:
cor[i][j]=sum/math.sqrt(x*y)
else:
cor[i][j]=None
j+=1
i+=1
sqsum=0.0
i=1
c=0
dif=0.0
while i<=943:
#algo for calculating the nearest neighbors
k=1
while k<len(testlis[i]) and testtim[i][k]>=p and testtim[i][k]<q:
maxcor=[0,0,0,0,0,0,0,0,0,0]
corra=[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
j=1
while j<=943:
24
if testlis[i][k] in lis[j]:
n=lis[j].index(testlis[i][k])
if tim[j][n]<q:
if cor[i][j]>maxcor[0]:
maxcor[0]=cor[i][j]
corra[0]=ratings[j][n]-ratings[j][0]
w=0
while w<9 and maxcor[w]>maxcor[w+1]:
t=maxcor[w]
maxcor[w]=maxcor[w+1]
maxcor[w+1]=t
t=corra[w]
corra[w]=corra[w+1]
corra[w+1]=t
w+=1
j+=1
sum=0.0
o=0
while o<10:
sum+=maxcor[o]
o+=1
o=0
while o<10 and sum:
maxcor[o]/=sum
#assigning weights to the 10 nearest neighbors
o+=1
o=0
sum=0.0
denom=0.0
while o<10:
sum+=maxcor[o]*corra[o]
denom+=maxcor[o]
o+=1
if denom and d and movrats [testlis[i][k]] [len(movrats [testlis
[i][k]])-1]:
prer=ratings[i][0]+movrats[testlis[i][k]][0]/movrats[testlis
[i][k]][len(movrats[testlis[i][k]])-1]-u/d+sum/denom
dif+=(prer-testrats[i][k])*(prer-testrats[i][k])
The RMSE obtained from this method is 1.3103.
is to group together data items that are similar to each other. Several algorithms have been
devised to solve this problem because of its widespread use. Notable among these is the k-means
approach which we have used here.
Formally, the problem of clustering can be defined as: given a dataset of N records, each member
of the dataset having a dimensionality d, we have to partition the data into disjoint subsets such
that some specific criteria is achieved. Each record is assigned to a single cluster and the
optimization measure is the Euclidean distance between a record and the corresponding cluster
center.
The k-means algorithm is based on the optimal placement of a cluster center at the centroid of
the associated cluster. Thus given any set of k cluster centers C, for each center c
C, let V(c)
denote the region of space that is closest to the center c. In every stage, the algorithm replaces the
centers in C with the centroid of the points in V(c) and then updates V(c) by recomputing the
distance from each point to a center c. These steps are repeated until some convergence condition
is met. For points in general positions (i.e. if no point is equidistant from the two centers), the
algorithm will converge to a point where further stages of the algorithm will not change the
position of any center. This would be an ideal convergence condition. The further stages of the
algorithm are stopped when the change in distortion, is less than a given threshold. This saves a
lot of time as in the last stages, the centers move very little in every stage. The results obtained
depend greatly based on the initial set of centers chosen. The algorithm is deterministic after the
initial centers are determined. The main advantage of the k-means approach is its simplicity and
flexibility. In spite of other algorithms being available, k-means continues to be an attractive
method because of its convergence properties.
26
So, we have used the above mentioned k-means model to predict the ratings for the users. The
Movielens dataset can be viewed as a record of 943 users each having a dimensionality of 1682.
Now, we will explain the approach [6] used for the prediction of the ratings.
1. Randomly initialise k cluster centers from the record of users. For this the function
random.sample(population,k) is used which returns a k length list of unique elements chosen
from the population sequence.
The Python code for the same is shown below.
#initialization of cluster centers
length=10
a=random.sample(range(1,943),length)
centre=[0]
i=0
while i<length:
j=1
temp=[0]
while j<=1682:
temp.append(R[a[i]][j])
j=j+1
centre.append(temp)
i=i+1
2. Associate each user to only 1 cluster based on the Euclidean distance from the cluster
center. The formula for Euclidean distance between two points p and q having dimensionality d
is shown below.
(3.7)
The Python code snippet is shown below.
27
i=1
while i<=943:
j=1
sum=0.0
count=0
while j<=1682:
sum = sum + (R[i][j]-centre[1][j]) * (R[i][j]centre[1][j])
j=j+1
sum=math.sqrt(sum)
minimum=sum
position=1
k=2
while k<=length:
j=1
sum=0.0
count=0
while j<=1682:
sum = sum + (R[i][j]-centre[k][j]) * (R[i][j]centre[k][j])
j=j+1
sum=math.sqrt(sum)
if sum <minimum:
minimum=sum
position=k
k=k+1
group[position].append(i)
i=i+1
3.Calculate the new cluster centers of the clusters found in step(2) above. The new cluster
center is the centroid of all the points in that particular cluster.
The Python code snippet is shown below.
i=1
while i<=length:
j=1
while j<=1682:
k=1
sum=0.0
count=0
while k<len(group[i]):
if R[group[i][k]][j]>0:
28
sum=sum+R[group[i][k]][j]
count=count+1
k=k+1
if count!=0:
centre[i][j]=sum/count
else:
centre[i][j]=0
j=j+1
i=i+1
k x n and Q
, set of k latent factors for each user and item respectively such that:
PT Q
User/Show
Suits
DN
GOT
PB
3
4
3
5
29
kxm
For example, consider the matrix presented in table 2.1. On running basic matrix
factorization for it with k=2, the baseline matrix factorization produces the following latent
vectors.
Item id Factor 1
Factor
1.319
0.8086
1.253
1.527
1.510
2.0165
1.443
0.875
2.293
-1.213
1.998
0.502
1.012
1.816
-0.213
2.885
The predicted values of the ratings, when compared with the actual values are:
User/Show
Suits
DN
GOT
PB
3, 2.887
5, 4.9707
1, 1.021
4, 4.0412
2.6115
4, 3.9439
2.2484
3, 3.0499
3, 3.0421
4, 4.0309
4, 3.9717
3, 2.9353
2, 2.0278
5.4358
-3.9515
5, 4.9702
30
The factors learnt by matrix factorization are latent factors i.e. they do not represent any
real world explicit characteristic of item/user like age, sex, location( for user) or release
date, number of seasons( for TV shows). Latent factors are representative of the implicit
structure present in the ratings. The algorithm is an example of feature learning where the
algorithm itself transforms the raw ratings matrix into an implicit factor, that can be
exploited in supervised learning.
One of the key factors in this algorithm is deciding the number of latent factors(k). The
higher the number of latent factors, higher the complexity of the model, which may lead
to better results. However, if the number of latent factors is too high, there is a high
chance of overfitting.
It uses the loss at any training example to approximate the total loss at that point of time and
minimizes by updating weights in the direction opposite to that of the positive gradient
Given Qi () , the loss at the i-th example, the update rule for is given by
:= (Qi ()/ )
... (3.8)
:= - (Qi ()/ )
Algorithm
The base algorithm (2.2) tries to approximate the matrix of ratings R by the product of user and
item latent factors P (K X n) and Q (K X m) respectively, where K is the number of latent
factors.
... (3.9)
The error in rating is given by:
... (3.10)
where ru,i is the actual rating given by user u to item i.
32
The algorithm uses stochastic gradient descent to minimize the learning error E given by:
... (3.11)
Regularization can be used to penalize high values for weights.
... (3.12)
... (3.13)
The gradients of e' with respect to the factor matrices are given by:
... (3.14)
We can use these gradients to update the latent factors:
... (3.15)
def final_res(self):
return self.U, self.V
def error(self,R):
e=0
for x in R:
i=x[0]
j=x[1]
R_ij=x[2]
R_hat=dot(self.U[i,:],self.V.T[:,j])
e= e+ pow(R_ij- R_hat, 2)
e=e/len(R)
e=math.sqrt(e)
return e
R,n,m=read_ratings("ua.base")
R2,n1,m1=read_ratings("ua.test")
instance= recsys1(n,m,k=5)
instance.factor(R)
Udash, Vdash=instance.final_res()
The RMSE obtained for the above method was 0.9692
... (3.16)
The same gradient descent algorithm can be applied as in 3.5.1; however, we have to update the
global biases at each step.
... (3.17)
The implementation in python is given below:
def factor(self,R):
self.V=self.V.T;
numrows = len(R)
temp= range(numrows)
for step in range(self.steps):
random.shuffle(temp)
e=0
for x in temp:
i=R[x][0]
j=R[x][1]
R_ij=R[x][2]
error_ij = R_ij - numpy.dot(self.U[i,:],self.V[:,j]) self.mu - self.b_u[i] -self.b_i[j]
#Update Rules for U,V
t=self.U[i,:]+self.alpha*(error_ij*self.V[:,j] self.beta*self.U[i,:])
self.V[:,j]=self.V[:,j]+self.alpha*(error_ij*self.U[i,:]
- self.beta*self.V[:,j])
self.U[i,:]=t
#print "Here"
#Update Rules for b_i, b_u
self.b_u = self.b_u - self.alpha*(self.beta*self.b_uerror_ij)
self.b_i = self.b_i - self.alpha*(self.beta*self.b_ierror_ij)
36
37
3.5.3 SVD++
SVD++ uses implicit feedback to increase prediction accuracy. It adds a second set of item
factors yi
k. These factors account for the implicit feedback present in the recommendation
data ie. The items user u has rated, irrespective of the rating he/she gives. The predicted rating
under this model is given by:
... (3.18)
The update rules of (3.15) and (3.17) change to:
... (3.19)
alpha=0.02
beta=0.02
def __init__(self, n,m, k,R):
self.n=n
self.m=m
self.k=k
self.U=numpy.random.rand(n+1,k)
self.V=numpy.random.rand(m+1,k)
self.Y=numpy.random.rand(m+1,k)
self.b_u=numpy.zeros(n+1)
self.b_i=numpy.zeros(m+1)
self.R=R
self.mu= sum(self.R[:][2])/len(self.R[:][2])
self.I=numpy.zeros(shape=(n+1,m+1))
self.Sigma_y=numpy.zeros(shape=(n+1,k))
self.R_u=numpy.ones(n+1)
self.Z=[]
for x in R:
self.I[x[0]][x[1]]=1
for x in range(n+1):
#self.Sigma_y[x,:]=numpy.zeros(self.k)
self.Z.append(numpy.nonzero(self.I[x,:])[0])
for x1 in numpy.nonzero(self.I[x,:])[0]:
self.Sigma_y[x,:]+=self.Y[x1,:]
self.R_u[x]+=1
#Update Rules for U,V
self.R_u[x]= 1/math.sqrt(self.R_u[x])
def factor(self,R):
self.V=self.V.T
#self.Y=self.Y.T
numrows = len(R)
temp= range(numrows)
for step in range(self.steps):
random.shuffle(temp)
e=0
for x in temp:
i=R[x][0]
j=R[x][1]
R_ij=R[x][2]
39
e= e+ pow(R_ij - numpy.dot(self.U[i,:]+
self.R_u[i]*self.Sigma_y[i],self.V[:,j]) - self.mu - self.b_u[i]
-self.b_i[j], 2)
if e<0.01:
self.V=self.V.T
break
e=e/len(R)
e=math.sqrt(e)
if e<0.01:
break
print e,step
self.V=self.V.T
def final_res(self):
return self.U, self.V
def error(self,R):
40
e=0
for x in R:
i=x[0]
j=x[1]
R_ij=x[2]
R_hat=numpy.dot(self.U[i,:],self.V.T[:,j])+self.mu +
self.b_u[i] +self.b_i[j]
e= e+ pow(R_ij- R_hat, 2)
e=e/len(R)
e=math.sqrt(e)
return e
R,n,m=read_ratings("ua.base")
R2,n1,m1=read_ratings("ua.test")
instance= recsys1(n,m,2,R)
instance.factor(R)
Udash, Vdash=instance.final_res()
The RMSE obtained for the above method was 0.9541
41
We use a conditional transition probability which restarts the random walk with probability
(3.21)
and
. The
(3.22)
Thus, the differential of the personalized PageRank can be calculated by differentiating the
eigenvector equations
(3.23)
This, in turn utilizes the partial differential of the transition probabilities of users and items:
42
)
(
)
(
(3.24)
Therefore, given a recommendation matrix, for user s, we can select the top-x items which he/she
has rated. This we put into a set d (destination nodes), and the rest we put into set l. The learning
error can be formulated as:
(3.25)
Where h(x) is the squared error. If any node in l gets a higher stationary probability than any
node in d, the value of the error function increases.
Thus the update rule for U and V can be given as:
(
43
44
No of neighbours
Cosine
Similarity
Pearsons R
Pearsons R with
Jaccard
Coefficient
10 nearest neighbours
1.3335
1.3238
1.3238
All users
1.2873
1.3061
1.3061
10 nearest neighbours
1.2471
1.2590
1.2590
All items
1.2268
1.2535
1.2535
Type of Model
User-User
Item-Item
The RMSE obtained for time based neighbourhood model in which cosine similarity was taken
along with 10 nearest neighbours and user-user correlation is 1.3103.
Clustering model
In this model, a predefined number of clusters were set. We set the number of clusters from 5 to
10. The results are shown below.
45
Number of Clusters
RMSE
C=5
1.0443
C=6
1.0441
C=7
1.0492
C=8
1.0437
C=9
1.0504
C=10
1.0443
RMSE
SVD
0.9632
SVD++
0.9541
Non-negative Matrix
Factorization
0.9692
0.9796
***
46
Future Scope:
One of the major drawbacks of the supervised random walk based approach is that it converges
very slowly, partly due to slow convergence of (3.23). However, the algorithm, by treating the
training dataset as a whole instead of each sample by itself, introduces various parallelizable
equations like (3.23) and (3.24). Parallelizing these equations can lead to a 1/k reduction in time
where k is the number of latent factors.
47
One of the emerging uses of recommendation systems have been in the context of social graphs.
Integrating recommendation systems with the social graph remains one of the major challenges
right now due to the unavailability of coherent social as well as recommendation data. Social
graph data holds the key to solving the cold start problem as well as reinforcing a users opinion
through.
***
48
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
49
9.
10.
50