Sunteți pe pagina 1din 12

RECOMMENDER SYSTEMS

WORKING RESEARCH ABOUT RECOMMENDER SYSTEMS- CONCEPTS & THEORY

Recommender systems are part science, part art and part technology. A recommender
systems is intended to solve 2 particular tasks:

1. To predict the rating for an item or product, the user has not rated yet.
2. To create the list of the top N recommended items.

I. Types of recommender systems


1. Collaborative Filtering
- Works by building a database (user-item matrix) of preferences for items by
users.
- Matches users with relevant interest and preferences by
calculating similarities between their profiles to
make recommendations.
- A user gets recommendations to those items
that he has not rated before but were already
positively rated by users in his
neighborhood.
- Output of CF can be of either prediction or
recommendation.
o Prediction - predicted score of item j for
user i
o Recommendation list of top N items
that the user will like the most.
o The technique can be divided into two
categories.
- Advantage: It can perform in the absence of
content about the items. It can recommend
items to the user even without the content being
in the users profile.
- Disadvantage: Can encounter a lot of problems such as the following:
o Cold Start Problem cold start refers to the situation where a
recommender does not have information about a user or item
o Data Sparsity Problem lack of information. When only a few of the total
number of items are rated by the user, it is very challenging to derive an
efficient recommendation.
o Scalability computation grows linearly with the number of items and
users. A recommendation technique that is efficient when the size of data
is limited may not be as efficient when the data size increases.
o Synonymy tendency of very similar items to have different names or
entries.
Example: baby wear and baby cloth
*Automatic term expansion, Singular Value Decomposition,
Latent Semantic Indexing are capable of solving synonymy.

1.1. Memory -based methods


- Find the neighbor of the active user
- Combine the active user and his neighbors preferences to generate
recommendation
1.1.1. Two Ways to do Memory Based CF
1.1.1.1. User based
Calculates similarities between users by comparing their ratings on
the same item
The assumption is users with similar preferences will rate items
similarly.
Predict the rating for an item by the active as the weighted
average of the ratings of the item by users similar to the active

2
USER BASED COLLABORATIVE FILTERING ALGORITHM

user. Weights are the similarities of these users with the target
item

3
1.1.1.2. Item - based
Calculates similarities between items.
Builds a model of item similarities by retrieving all items rated by
an active user from the user -item matrix, it determines how
similar the retrieved items are to the target item.
Then select the k most similar items.
Take the weighted average of the active users rating on the similar
items k. Weights are the similarities of the items.

1.1.2. Similarity Measures Used for CF


1.1.2.1. Correlation based (statistical approach)
Pearson correlation is used to measure the extent to which two
variables linearly relate to each other. In CF, two variables mean two
USERS.

*applicable for normalized data

4
1.1.2.2. Cosine based (linear algebra approach)
Measures the similarity between two n-dimensional vectors based on
the angle between them (projection based) Different from
correlation based. Cosine similarity is based on linear algebra rather
than statistical approach.

The similarity between two user/item A and B can be computed as:

r a ,i r b , i
i
s ( a , b )=
r r
i
2
a ,i
i
2
b ,i

s ( a , b) rb , i
i=1
p ( a ,i )= n

s ( a , b)
i=1

1.1.2.3. Item Item Similarity


(from Amazon, A Review of Recommender System)

For each item in product catalog I1 do


For each customer C who purchased I1 do
For each item I2 purchase by customer C do
Record that a customer purchased I1 and I2
End
End

For each item I2 do


Compute similarity between I1 and I2
End
End

1.2. Model -based methods


- Analyzes the user -item matrix to identify relations between items,
- Advantage: Solves sparcity problems
- Disadvantage: Do not alleviate the cold start problem.
1.2.1. Association Rules
Extract rules that predict the occurrence of an item based on the
presence of other items in the transaction.

5
How? Given a set of transactions where each transaction is a set
of items, an association rule applies the form A B. If A is in the
transaction then B is likely to be in it as well.

1.2.2. Clustering
Tries to partition a set of data into a set of clusters to discover
meaningful groups within them.
Once clusters have been formed, the opinions of other users in a
cluster can be averaged and used to make recommendations for
individual users.
K means and Self Organizing Map (SOP) are the most popular
clustering methods.

1.2.3. Decision Trees


This is based on the methodology of tree graphs.
Decision trees are constructed by analyzing a set of training
examples for which class labels are known.
They are then applied to classify previously unseen examples.
If trained on a very high quality data, they can make very
accurate predictions.
Decision trees are more interpretable because they combine
simple questions about data in an understandable pattern.

1.2.4. Artificial Neural Network (ANN)


ANN is a structure of many connected neurons (nodes) which are
arranged in layers in systematic ways.
The connection between neurons have weights associated with
them depending on the amount of influence one neuron has on
another.
ANN has the ability of estimating nonlinear functions capturing
complex relationships in data sets.

1.2.5. Regression
Regression analysis is used when two or more variables are
thought to be linearly related.
It is usually used for curve fitting, prediction, and hypothesis
testing about relationships between variables.

1.2.6. Bayesian methods (Bayes Classifiers)


Probabilistic framework for classification problems
Based on conditional probability and Bayes theorem.

6
1.2.7. Matrix Completion Technique
The objective of MCT is to predict the unknown values within the
user item matrices.
Correlation based K-nearest neighbor is one of the major
techniques in collaborative filtering.
K- nearest neighbor depend on historical rating data of users on
items.
The rating matrix is always very big and sparse because users do
not rate most of the items in the matrix.
Theres a need to analyze low rank and partially observed
matrix through Alternating Least Squares (ALS)
1.2.8. Latent factors Models
1.2.9. Singular Value Decomposition

Case 1: With Ratings (apply algorithms for real ratings data memory based)

Case 2: No ratings (apply algorithm for binary data association rules)

2. Content-based Filtering
- Recommendation is based on user profile using features of the content of the
items the user has evaluated in the past.

7
- Do not require the profile of other users since
they do not influence recommendation.
- Items that are mostly related to the positively
rated items are recommended to the user
- Most successful for web pages, publication, and
news recommendations
- Advantage: They can recommend new items
even if there are no rating provided by users.
If the user preference changes, it has
the ability to adjust its recommendation
quickly.
- Disadvantage: Needs to have an in-depth
knowledge and description of the features of the
items in the profile.

Content based filtering uses different models such


as:

2.1.Vector Space Models


2.1.1. Term Frequency/Inverse Document
Frequency (IF/IDF)
2.2.Probabilistic Models (to model relationship between different documents within
a corpus)
2.2.1. Nave Bayes Classifier
2.2.2. Decision Trees
2.2.3. Neural Networks

3. Knowledge-based Filtering applicable for cold-start problems or products


that have very rare reviews like cars, real state,
3.1.Constraint based recommender systems
3.2.Case -based recommender systems
4. Demographic Recommender Systems
5. Community based
6. Hybrid and Ensemble-based Recommender Systems
6.1.Weighted hybridization - combines the results of different recommenders to
generate a recommendation list or prediction by integrating the scores from
each of the techniques in use by a linear formula.
6.2.Switching hybridization example : content -based is deployed first before
collaborative recommendation in a situation where the content -based system
cannot make recommendations with enough evidence.

There are different approaches to Hybrid filtering:


Approach 1: Separate implementation of algorithms and combine the result
Approach 2: Content based filtering in collaborative approach
Approach 3: Collaborative filtering in content based filtering approach
Approach 4: Creating a unified recommendation that brings together both
approaches.

8
II. Evaluation Metrics for Recommendation Algorithms
1. Mean Absolute Error(MAE) most popular and commonly used. It is a measure of
deviation of recommendation from users specific value. The lower the MAE the
more accurate the recommendation engine predicts user rating

is the predicted rating for user u on item i


is the actual rating and
N is the total number of ratings in the item set.

2. Root Mean Square Error puts more emphasis on larger absolute error. The lower,
the better recommendation accuracy.

is the predicted rating for user u on item i


is the actual rating and

9
n is the total number of ratings in the item set.

3. Reversal rate
4. Weighted errors
5. Receiver Operating Curve (ROC)
6. Precision Recall Curve (PRC)
7. Precision, Recall and F -measure.

III. Application of Recommender Systems


1. Entertainment recommendation for movies, music, IPTV
2. Content personalized newspapers; recommendation for documents, webpages, e-
learning applications and email filters.
3. Ecommerce recommendation for consumers of products to buy such as books,
cameras, pcs, etc.
4. Services travel services, house to rent, consultation.

IV. Domain Specific Challenges in Recommender Systems


1. Context-based recommendations. Such contextual information could include time,
location, or social data. For example, the types of clothes recommended by a
retailer might depend both on the season and the location of the customer.

2. Time Sensitive Recommender Systems. a movie may be very different at the time
of release from the recommendations received several years later. In such cases, it
is extremely important to incorporate temporal knowledge in the recommendation
process.

3. Location-based Recommender Systems. For example, a traveling user may wish to


determine the closest restaurant based on her previous history of ratings for other
restaurants.
a. User specific locality
b. Item specific locality

4. Social Recommender Systems.


a. PageRank Algorithm

5. Other challenges
a. Scalability
b. Proactivity (when and how to push recommendations)
c. Privacy
d. Diversity diversity of the items recommended
e. Integration integration of long -term and short term preference of
customers.

10
f. Device should operate on any device

V. Information that can be collected


1. Explicit feedback user ratings for items
2. Implicit feedback - user preferences and user profile
a. History of purchases
b. Navigation history
c. Time spent on web pages
d. Links followed by the user
e. Clicks
VI. R Packages for recommendation systems
1. recommenderlab
2. rrecsys
3. recosystem recommender system using parallel matrix factorization

REFERENCES:

Recommendation Systems: Principles, methods and evaluation


http://www.sciencedirect.com/science/article/pii/S1110866515000341

A Review on Recommender System


http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=ADBFC8F74E311870EF38AE235
F83A657?doi=10.1.1.401.9380&rep=rep1&type=pdf

Recommender Systems 101 a step by step practical example in R


http://bigdata-doctor.com/recommender-systems-101-practical-example-in-r/

A REVIEW ON RECOMMENDER TECHNIQUES, SYSTEMS AND EVALUATION METRICS


http://www.sci-int.com/pdf/187055541132-A%20Review%20%20Mubarak%20Malaysia
%2019-9-12%20_corrected_503-511.pdf

READ:
User -based Collaborative Filtering
http://www.dataperspective.info/2014/05/basic-recommendation-engine-using-r.html

Item based Collaborative Filtering


http://www.dataperspective.info/2015/11/item-based-collaborative-filtering-in-r.html

http://bigdata-doctor.com/recommender-systems-101-practical-example-in-r/

https://ashokharnal.wordpress.com/2014/12/18/using-recommenderlab-for-predicting-
ratings-for-movielens-data/

HOW RECOMMENDERLAB OF R CULCULATE THE


RATINGS OF EACH ITEM IN RATINGMATRIX?

11
http://stackoverflow.com/questions/17610104/how-recommenderlab-of-r-culculate-the-
ratings-of-each-item-in-ratingmatrix

FURTHER RESEARCH:

Choi, K., & Suh, Y. (2013). A new similarity function for selecting neighbors for each target
item in collaborative filtering. Knowledge-Based Systems, 37, 146-153.

Liu, H., Hu, Z., Mian, A., Tian, H., & Zhu, X. (2014). A new user similarity model to improve
the accuracy of collaborative filtering. Knowledge-Based Systems, 56, 156-166.

Bobadilla, J., Ortega, F., Hernando, A., & Glez-de-Rivera, G. (2013). A similarity metric
designed to speed up, using hardware, the recommender systems k-nearest neighbors
algorithm. Knowledge-Based Systems, 51, 27-34.

Lathia, N., Hailes, S., & Capra, L. (2008, March). The effect of correlation coefficients on
communities of recommenders. In Proceedings of the 2008 ACM symposium on Applied
computing (pp. 2000-2005). ACM.

12

S-ar putea să vă placă și